- I. Introduction 1
- II. The science of networks 2
- III. Analysis of plant metabolic networks 5
- IV. Conclusion 12
Metabolism is one of the best recognised networks within biological systems, but our understanding of metabolic regulation has been limited by a failure to consider regulation within the context of the whole network. With recent advances in theoretical aspects of network thinking and a postgenomic landscape in which our ability to quantify molecular changes at a systems level is unsurpassed, the time is ripe for the development of a new level of understanding of the regulation of plant metabolic networks. Theoretical advances such as the formal description of ‘scale-free’ networks have provided explanations for network behaviour (such as robustness). In parallel, the appreciation of the importance of new levels of the metabolic regulatory hierarchy (such as protein–protein interaction) and the continuing development of global profiling technologies is generating a system-wide molecular data set of increasing resolution. In this review we will argue that the integration of these different aspects of metabolic research will bring about a step change in our understanding of the regulation of metabolic networks in plants.
Reductionism has dominated biological thinking over the last 100 years. Our attempts to understand cellular complexity have been based around a study of the properties of the individual molecules that make up the cell. Thus, we reduce a cell to its component parts. The essential premise of this approach is that knowledge of the properties of these component parts will allow us to reassemble them back into a mechanistic model of the complete cell. However, despite an ever more sophisticated analytical reach and a correspondingly rapid increase in the information richness of the molecular parts list, it is difficult to escape the conclusion that we are getting no closer to the elusive ‘in silico’ cell. It is becoming apparent that reassembling Nature's jigsaw puzzle is going to be harder than we thought (Barabasi, 2003).
There are two main problems with the reductionist philosophy that have impeded progress. The first is complexity. In systems of such high complexity as biological networks, there are literally billions of ways in which the molecular components can be fitted together and it is unlikely that a unique solution to the puzzle exists. One would therefore have to question whether this ‘bottom-up’ approach realistically stands any chance of ultimate success. The second problem emerges with the realisation that biological function is rarely the property of a single molecular entity. Instead, function results from complex interactions between large numbers of molecules. In other words, in order to understand fully the mechanistic workings of the cell as a whole, we have to think in terms of networks. Moreover, this network thinking has to be able to take in the concept of dynamism; the network is not a static entity, but rather is constantly rewired and reconfigured to suit the prevailing physiological demands placed upon the cell at any given moment in time. In this review we will discuss the rapid advances that have been made in network thinking over the last few years and describe how emerging concepts of network topology and design relate to metabolic networks. We will argue that this top-down ‘systems’ approach, in combination with the measurement of the appropriate molecular parameters at the appropriate resolution, will bring about a step change in our ability to understand the mechanistic basis for cellular behaviour. In addition, the impact of postgenomic technologies on our investigations of plant metabolic networks will be discussed.
In some senses, every molecule within the cell is part of the same global network, because ultimately everything is linked to everything else. However, conceptually, we tend to break the network down into a series of subnetworks that consist of either the same type of molecules, or the same type of interaction. The metabolic network is one of the most obvious and best recognised of these subnetworks. Metabolic pathways are formed from chains of enzymes that interconvert a substrate to a product via a series of intermediate steps. However, few known metabolic pathways work in isolation. Instead, different pathways share the same metabolic intermediates such that pathways become interdependent and connected with one another. Metabolism thus forms a highly branched, highly connected network. Although reductionist studies of the kinetic properties of individual enzymes have historically dominated the study of the metabolic network, metabolism was one of the first disciplines to embrace a systems approach.
In the late 1970s, Kacser and colleagues formalised a mathematical description of system properties of enzymes and metabolites (Kacser & Burns, 1973). Their method, which has become known as metabolic control analysis, allows the effect of small changes in enzymes or metabolites on the whole system to be quantified. This emphasis on systems properties was a radical departure and laid the mathematical foundation for a network analysis of metabolism. Unfortunately, the empirical determination of these systems parameters is experimentally demanding and, although the use of metabolic control analysis has improved our understanding of the regulation of some metabolic pathways in plants (e.g. Geigenberger et al., 2004), it has never really fulfilled its potential as a network analytical tool. However, the emergence of postgenomic approaches that allow high-throughput cataloguing of changes in abundance of transcripts, proteins and metabolites, coupled with significant advances in our ability to follow metabolic events and to quantitate metabolic flux (Ratcliffe & Shachar-Hill, 2004; Fernie et al., 2005), has raised fresh expectations of a systems description of plant metabolism (Sweetlove et al., 2003).
The analysis of networks is not a new science and has been used by social scientists and engineers for many years. However, a series of papers by Albert-Laszlo Barabasi and his coworkers has ignited a new interest in network thinking amongst biologists. What Barabasi discovered was a novel network architecture – the so-called scale-free network. The reason this has generated so much excitement is that this scale-free architecture appears to be common to a whole range of networks, from artificial networks, such as the internet, through social networks to biological networks, including metabolism. Moreover, this particular network architecture appears to explain some of the properties of network behaviour.
The starting point of describing a network is to make a graphical representation in which a series of nodes are joined by lines. In metabolic terms, each node would represent a different enzyme and the lines would represent the metabolites that connect two enzymes. For the last 40 years, network thinking has been dominated by the random network theory devised by two Hungarian mathematicians, Paul Erdős and Alfred Rényi. The Erdős–Rényi model proposes that networks contain nodes that are connected randomly (Fig. 1). This model has persisted because it explains one of the known properties of networks – small-world behaviour. In a random network, if the average number of links between nodes is one or more, then the network forms a giant cluster such that it is possible to trace a surprisingly small path between any two nodes.
The problem with the random network model is that it equates complexity with randomness. Instinctively, it seems implausible that the higher-order organisation required in biological networks to generate biological function can be generated by random connection of the molecular entities of the cell. Indeed, real networks are not randomly distributed but contain definite clusters. In 1998, Watts & Strogatz proposed an alternative network model in which the nodes were arranged with links between them on the edge of a circle (Watts & Strogatz, 1998). More distant cross-links were introduced to maintain small-world behaviour. A year later, based on an investigation of the topology of the internet, Barabasi proposed his scale-free model for network structure (Barabasi & Albert, 1999). The scale-free structure is defined by the fact that the distribution of the number of links per node follows a power law, in contrast to the Poisson distribution obtained for a random network (Fig. 1).
The consequences of this difference for network structure can be explained by examining analagous distributions. For example, physical parameters of organisms in a population have Poisson distributions. Thus, the height distribution of humans has a relatively narrow range, with most individuals having a height close to the median. However, if the height of humans were to follow a power law (which decays much more slowly than a Poisson distribution), then we would not be surprised to find a small number of individuals that were 100 feet tall. A power law allows for greater extremes. In the context of a network, this means that the degree of linkage within the network is highly heterogeneous, with a small number of nodes being extremely highly connected. These nodes are known as hubs. Moreover, Barabasi proposed a mechanistic basis for this network structure based on a ‘rich-get-richer’ principle (Barabasi & Albert, 1999). Essentially, in a growing network, new nodes prefer to attach to nodes that are already highly connected. Thus there is a positive reinforcement of hub connectivity. If one also introduces a fitness component (i.e. fit–rich get richer), then one can allow for the emergence of new hubs (in the case of the internet, think Google).
It turns out that this scale-free architecture is not restricted to artificial networks such as the internet, but also accounts for the topology of a whole range of networks including biological networks. The metabolic networks of Escherichia coli (Wagner & Fell, 2001) and of a range of other bacteria and simple eukaryotes (yeast and Caenorhabditis elegans) (Jeong et al., 2000) have been shown to have a scale-free architecture (although it is worth mentioning that this conclusion can be challenged if conservation of chemical moieties is considered when defining links between enzyme nodes; Arita, 2004). Mechanistically, one can argue that growth of metabolic networks occurs through evolution. The fact that the most ancient metabolically pathways also appear to be the most highly connected supports the idea that scale-free networks emerge through preferential attachment of new nodes to nodes that are already highly linked (Wagner & Fell, 2001). Moreover, there is a remarkable degree of conservation of the composition of hubs, whereas species-specific differences in the metabolic network reside principally in the less connected nodes (Jeong et al., 2000). This suggests that the underlying structural organisation of metabolism is conserved across all kingdoms of life.
To most biochemists, the news that metabolic networks contain hubs of highly connected metabolites is no news at all. It has long been apparent that primary metabolic pathways link together disparate parts of metabolism. The Krebs cycle, for instance, was referred to as a central ‘hub’ of metabolism long before the current fashion for all things hub-like (Douce & Neuberger, 1989). So the question arises as to whether the formal description of metabolic networks as scale-free provides any new insight into the organisation and regulation of metabolism. The answer is a qualified yes.
Although network analysis is essentially descriptive, it is striking that scale-free networks share a trait common to biological networks – robustness. For a network to be maintained in functional state, it must be tolerant of error or node failure. Robustness is a fundamental feature of biological systems (Savageau, 1976; Hartwell et al., 1999), and network analysis suggests that it may be a property of the structure of the network. Theoretical studies demonstrate that scale-free networks are remarkably tolerant of the removal of random nodes: even a drastic removal of up to 5% of the nodes does not affect the connectivity of the remaining nodes (Albert et al., 2000). The explanation for this behaviour lies in the heterogeneity of a scale-free network. Removal of a hub node will have far greater consequences than removal of a non-hub node. Because hub nodes are comparatively rare, the chance of a random disruption affecting a hub node is low in comparison with a non-hub node. Thus the network is tolerant of random error. The corollary is that the network is sensitive to error that targets the hubs. At first inspection, this is borne out by the behaviour of metabolic networks. A mutation in an enzyme of primary metabolism is likely to have more far-reaching consequences than one in an enzyme of secondary metabolism. In yeast, it is indeed the case that the most highly connected enzymes are the most important for survival (Jeong et al., 2001).
It is interesting to consider whether this metabolic Achilles heel is still present in more highly evolved eukaryotes such as higher plants. The fact that viable knockout lines are available for most enzymes of primary metabolism (Sessions et al., 2002) and that many knockout mutants have no apparent phenotype (Bouche & Bouchez, 2001; Budziszewski et al., 2001) suggests that the situation may be different in plants. One possible explanation for the apparent insensitivity of plants to hub–gene mutation may be the prevalence of gene families in the plant genomes (The Arabidopsis Genome Initiative, 2000). Rather than representing redundancy, the presence of multiple isoforms of the same enzyme may instead represent an evolved mechanism to preserve the integrity of the metabolic network in the face of mutation of one of the isoforms. Gene families may confer robustness. This idea generates two testable hypotheses: first, that removal of one or several members of a gene family should have a fitness consequence over several generations; and second, that the number of isoforms of an enzyme should relate to its degree of connectivity within the metabolic network – in other words, one would predict that hub enzymes would have a large number of isoforms in comparison to non-hub enzymes.
In engineering terms, a system that provides an optimal solution to a process is inflexible – if another process is to be handled, the entire system has to be redesigned. To overcome this problem, engineers design modular systems in which discrete modules of defined functionality can be combined in different combinations to generate a variety of solutions. Metabolic networks are also thought to be modular in design (Savageau, 1976; Hartwell et al., 1999; Wagner & Fell, 2001). Although modularity is not incompatible with the notion of a scale-free network (hierarchical assembly of small modular units can generate a scale-free network; Ravasz et al., 2002), it does have some interesting consequences for our understanding of the regulation of networks.
The first consequence of modularity within a network is the presence of motifs. If a network consists of modular building blocks, then one would expect to see certain patterns repeated throughout the network. In terms of regulation of networks, the identification of such motifs offers the possibility of divining the underlying mechanism of regulation. Computational analysis of network architecture allows the identification of motif patterns that occur at a frequency statistically greater than by chance (Milo et al., 2002; Berg & Lassig, 2004). The high degree of conservation of motifs identified in this way within the yeast protein–protein interaction network (Ravasz et al., 2002), together with the convergent evolution of the transcription factor regulatory network towards the same motifs in a wide range of species, suggests that these motifs have a biological significance (Conant & Wagner, 2003). Regulatory motifs such as feedback loops are, of course, well known in metabolism. Motif-searching algorithms offer the opportunity to undertake a more systematic examination of the regulatory motifs in metabolism, an exercise that would undoubtedly increase our understanding of metabolic network regulation. In the meantime, some information can be gleaned from studies of motifs present in the transcription factor mediated gene regulatory networks of bacteria (Shen-Orr et al., 2002) and yeast (Lee et al., 2002). Several different types of regulatory motif were identified. It turns out that metabolic genes are often regulated via a network motif known as a single-input motif (in which a single transcription factor binds to the promotors of a set of genes). This makes sense because it allows coordinated control of the expression of entire metabolic pathways. Moreover, differing sensitivity thresholds of different promotors to the same transcription factor could allow an ordered temporal wave of expression through the pathway. Such a response, in which the first enzyme of a pathway is increased first and to the greatest extent, could be important in optimising resource usage (Zaslaver et al., 2004).
The second consequence of modularity within a network is that is affects the importance of specific link types. According to Barabasi, robustness in a scale-free network is conferred by virtue of the fact that weakly connected links have relatively little impact upon overall network connectivity (see Section II.1). However, a new analysis which organises the network into functional modules comes to a slightly different conclusion (Luscombe et al., 2004). The analysis centres around a new algorithm that identifies functional modules on the basis that nodes with the same role should have the same connection topology (a useful analogy here is to consider that two CEOs of different companies will perform essentially the same role and will operate within the same organisational hierarchy even if the two companies produce vastly different products). The algorithm was used to identify modules (i.e. groups of nodes with the same function) within the metabolic networks of a range of organisms. It was found that most modules contain metabolites from a single pathway, although central metabolic pathways (such as carbohydrate and amino acid metabolism) cannot be separated in this way. What is significant about this work is that when the importance of different types of links was assessed, it was found that removal of weakly connected nodes that connect modules together had a greater impact than removal of highly connected hub nodes. This is at odds with the Barabasi view of the importance of hubs. One possible explanation for this may be the occurrence of protective redundancy within the hub nodes (see Section II.1).
The final aspect of the scale-free network that needs to be modified is the idea that hubs are immutable. In the hub-centric scale-free network model, the emergence of hubs is seen as a consequence of the cumulative preferential attachment of new nodes to nodes that are already highly connected. This gives the impression that, once formed, the composition of a hub is fixed in time. However, an investigation of the dynamic behaviour of the yeast transcription factor mediated gene regulatory network suggests otherwise (Luscombe et al., 2004). By examining the expression pattern of over 3000 target genes of 142 transcription factors, Luscombe et al. were able to ‘trace-back’ which network paths are active in a given condition. Their results clearly show that, in response to a diverse range of stimuli, the interactions between transcription factors and their targets are altered – the network is effectively ‘rewired’. Remarkably, it is shown that within this changing network hubs are almost entirely transient in nature (78% of hubs are influential in one condition but less so in others). This emphasises the need to abandon the idea propagated by the drawing of metabolic pathway charts that metabolic architecture is fixed. We have to consider the network as a dynamic entity and understand that its structure and therefore its regulation will be different in different conditions.
In the previous section, we have described how network analysis has contributed to our understanding of network architecture and how this architecture can be used to explain and investigate network behaviour and regulation. Clearly, the first step in such an endeavour is accurately describing the network under consideration in its entirety. This is why much of the network analysis of biological systems has concentrated on organisms such as E. coli and yeast that are extensively characterised at the genomic and molecular level and for which sufficient information is available to describe fully their molecular networks. However, the genomic and molecular information available for more complex eukaryotes is beginning to rival that of E. coli and yeast. This is particularly true for the model plant species, Arabidopsis thaliana. Although network tools have yet to be applied to higher plants, a wealth of global data sets has been accumulated. In this section, we will discuss what needs to be put into place to undertake a systems analysis of the plant metabolic network. We will also review what can be learnt from the global profiling studies that have already been completed.
Before interrogation of network functionality can be attempted, it is clearly essential that the structure of the network is fully elucidated. This is by no means a trivial task. Even the most classical of biological networks, metabolic pathways and signal transduction cascades, are only partially defined. Evaluation of the cellular metabolism of plants (Buchanan et al., 2001) reveals how daunting a task network structure elucidation is likely to be. A huge amount of research effort has been placed into metabolic pathway definition, and recent genome sequencing efforts have allowed the definition of the majority of proteins associated with metabolism. However, it is important to note that, contrary to conventional wisdom, our current knowledge of the structure of plant cellular metabolism is far from complete. Whereas some pathways of primary metabolism have been carefully and systematically elucidated, the majority of secondary metabolic pathways have not. Moreover, the participation of enzymes in previously undefined metabolic contexts (even for enzymes that are components of well-defined pathways) remains a genuine possibility (Schwender et al., 2004).
Part of the problem lies in the amount of ‘assumed knowledge’ of gene function in plants. A large proportion of Arabidopsis genes are still classified merely on the basis of their homology to genes from other species. Moreover, the role of many proteins and enzymes are assumed on the basis of their functionality in better studied, less complex systems. Emergent databases of metabolic pathways (for example, the Kyoto Encyclopedia of Genes and Genomes Metabolic Descriptions, http://www.genome.ad.jp/kegg, and MetaCyc, http://metacyc.org/), whilst invaluable resources in their own right, have only exacerbated the perception that pathway structure is established in cases where no empirical evidence for cross-kingdom pathway homologies exists. This over-reliance on structural information from microbial systems is problematic because it is clear that a network model that is a poor resemblance of the in vivo situation has limited use. The situation should be improved by the development of curated plant-specific databases such as AraCyc (Mueller et al., 2003) and by pathway visualisation tools such as MAPMAN (Thimm et al., 2004), that allow the experimenter to easily construct their own pathways de novo. However, even with a plant-specific database in place, caution must still be exercised: it remains likely that the Arabidopsis metabolic network structure will be dramatically different from that of quaking aspen, for example. Not only do plant species have widely different genome sizes and numbers of genes, but they also display staggering metabolic diversity. It has been estimated that over 200 000 metabolites exist in the plant kingdom, with any single plant species displaying only a fraction of this diversity in its metabolome (De Luca & St Pierre, 2000).
With the above caveats in mind, it is clear that there is a great need for empirical testing of metabolic network structures. Although network analysis is very much in vogue at the moment, it is important to note that the analysis of pathway structure has been a fundamental aspect of biochemistry for many decades. Important historical examples include, but are by no means limited to, the discovery of the tricarboxylic acid (TCA) and Calvin cycles (Krebs & Johnson, 1937; Calvin, 1962). The elucidation of these cardinal pathways facilitated much further research effort concerned with metabolic regulation, (micro)compartmentation and pathway interaction (detailed in Section III.3), which has allowed high-level understanding of these pathways. However, even for well-studied pathways such as the TCA cycle, our understanding remains somewhat limited. Despite the fact that the reactions described by Krebs were identified to occur in plants in the 1960s (Beevers, 1961), relatively few studies have addressed the function or regulation of this pathway in plants (for a detailed discussion, see Fernie et al., 2004a).
Dispelling the idea that plant metabolism is a ‘done deal’ is the fact that we continue to elucidate new pathways and refine our understanding of existing ones. In recent years, important discoveries include the plant pathways for ascorbate metabolism (Wheeler et al., 1998; Green & Fry, 2005) and isoprenoid biosynthesis (Masse et al., 2004; Wolferetz et al., 2004) and the demonstration of the role of ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco) in a previously undefined metabolic context (Schwender et al., 2004). These examples, alongside recent elegant studies on carbon nitrogen interactions (reviewed in Galili, 2002; Stitt et al., 2002; Stitt & Fernie, 2003) are excellent illustrations of foundation studies on which plant metabolic network analysis can be developed. Isotope labelling studies of the mevalonate pathways of diatoms and higher plants allowed clarification not only of the network structures of these species but also functional analysis of the relative importance of different routes to the same end. Whereas the elucidation of the pathways of synthesis (Wheeler et al., 1998; Agius et al., 2003) and degradation (Green & Fry, 2005) of ascorbate placed previously identified enzymes in a novel metabolic context. Similarly, a recent study demonstrated that Rubisco can function independently of the Calvin cycle to improve the carbon efficiency of developing green seeds (see Fig. 2). In doing so, this study solved a puzzle that has been perplexing plant biochemists for many years. The storage of carbon as oil was thought to be intrinsically inefficient because one carbon is lost in the form of carbon dioxide for each triacylglycerol incorporated into fatty acids owing to the action of pyruvate dehydrogenase. The metabolic fate of uniformly labelled carbon sources following feeding to Brassica napus seeds was compared to theoretically calculated values following elemental flux mode analysis (Schuster et al., 1999) of textbook pathways. Surprisingly, a 3 : 1 ratio of carbon in oil to carbon liberated as carbon dioxide was experimentally determined. This ratio is higher than the 2 : 1 ratio expected, given the action of pyruvate dehydrogenase. Further flux experimentation utilising [1–13C] or [U-13C]alanine revealed that Rubisco was responsible for the fixation of carbon dioxide and the absence of label randomisation suggested that this enzyme was acting in isolation from the Calvin cycle. In establishing this route, the authors were able to demonstrate that the formation of acetyl CoA formation was in fact more efficient than previously thought and that this pathway is responsible for 62% of 3-phosphoglycerate (3PGA) production in developing B. napus embryos. This example highlights the power of combining mathematical approaches with traditional isotope tracer studies to aid the elucidation of metabolic networks. The fact that a previously unrecognised role has been established for such a fundamental enzyme also amply highlights the paucity of our current knowledge of plant metabolism.
Mechanisms of metabolic regulation have been established for many years and require little comment here, save to say that in a network context all of them are of important. Given that recent theoretical and experimental studies indicate that the majority of control occurs at the post-transcriptional level (Ter Kuile & Westerhoff, 2001; Urbanczyk-Wochniak et al., 2003a), understanding of metabolic network regulation will ultimately involve systematic characterisation not only of transcript and protein abundance but also of post-translational modification of enzymes, protein–protein interaction and kinetic characterisation of every enzyme of the cell. Furthermore, a large amount of experimental data suggests that the canonical ‘pyramid of life’ (Oltvai & Barabasi, 2002), whereby information is passed from gene to RNA to protein to function, is somewhat misleading because this hierarchy is clearly not unidirectional. Examples of feedback in this hierarchy include, but are not limited to, the stabilisation of RNA by metabolites (Fafournoux et al., 2000) and the metabolite-mediated regulation of gene expression (Sheen, 1990; Templeton & Moorhead, 2004). Thus it is probably imperative to analyse as many interacting elements as possible in order to establish network regulation. To date, very few large-scale studies of network regulation have been carried out in plants. Therefore, in this section we intend to discuss in general the approaches that can yield important information on aspects of metabolic regulation before detailing a few case studies in which metabolic regulation has been elucidated at the pathway and/or subnetwork level.
Network thinking in plants has focussed mainly on the elucidation of signal transduction cascades. It is perhaps apt that many of the studies that have highlighted the complexity of metabolite regulation also originated in this research field. Indeed, work on cellular sensors of ATP and ADP and of sugars (Sheen, 1990; Hardie, 2003) has revealed that metabolites not only act as intermediates in pathways but can also be important integrators of metabolic status with other fundamental cellular events including transcription, translation and covalent modification of proteins (Templeton & Moorhead, 2004). The identification of the mechanism of the Trp RNA binding attenuation protein (TRAP) in Bacillus subtilis, responsible for both transcriptional attenuation and translational control of Trp synthetic pathway genes (Babitzke & Gollnick, 2001), together with the subsequent finding that such riboswitches are prevalent across biology (Sudarasan et al., 2003), suggests that other aspects in addition to RNA and transcription factor mediated control of gene expression require intensive investigation in the future.
The rapid emergence of proteomics as one of the mainstays of postgenomic research also opens up another level of the regulatory hierarchy. The proteomics field is currently undergoing a shift away from techniques that merely generate protein catalogues towards those that are more quantitative in nature (Ong et al., 2002; Tyers & Mann, 2003), those that allow the definition of post-translational modifications (Mann & Jensen, 2003) and protein–protein interactions (Rohila et al., 2004). Analysis of these aspects of the plant proteome is still in its infancy. Moreover, the greater experimental input to introduce tagged genes into plants in a systematic fashion in comparison to yeast has limited progress. Thus, there are currently no plant protein–protein interaction models of the same scale that have been achieved for yeast (reviewed in Cornell et al., 2004).
Although we lack the systematic maps of protein–protein interactions that are available for yeast, many intriguing protein–protein interactions have been functionally characterised in plants and the transient association of enzymes into functional complexes is being increasingly recognised as an important part of the regulatory hierarchy (for comprehensive reviews, see Winkel, 2004 or Jorgensen et al., 2005). The realisation that adjacent enzymes in metabolic pathways can associate with one another to form a functional complex has important implications for our understanding of the regulation and organisation of the plant metabolic network.
The groundbreaking work of Paul Srere and coworkers demonstrated the presence and functionality of several such multienzyme complexes within the TCA cycle of several microbial and also of mammalian species (Srere, 1985; Robinson et al., 1987). Through these studies, he was able to demonstrate that enzymes in a pathway functioned far more efficiently when associated together due to the channelling of the connecting metabolic intermediates between the enzymes in question. Srere called such channels metabolons, and argued that they were a common motif in cellular metabolism. Indeed, earlier studies sugggest that in Neurospora crossa all measureable enzyme activities were associated with organelles, membranes or cytoskeleton, with little or no protein present in the aqueous soluble fraction (Zalokar, 1960). The demonstration by Yanofsky & Rachmeier in 1958 that free indole was not an intermediate in the biosynthesis of tryptophan was probably the first strong proof for metabolic channelling (Yanofsky & Rachmeier, 1958) and has since been confirmed by X-ray crystallography (Hyde et al., 1988).
Since their identification in microbial systems, many metabolons have subsequently been defined in plants, including the Calvin cycle (Suss et al., 1993), dhurrin biosynthesis (Moller & Conn, 1980), flavonoid pathways (Winkel-Shirley, 1999), phenylpropanoid metabolism (Achnine et al., 2004) and polyamine biosynthesis (Panicot et al., 2002). Given the diversity of the pathways in this list, it seems likely that the roles of these channels differ considerably. Nevertheless, it can be argued that metabolons have a universal importance both in terms of metabolic regulation and in terms of metabolism per se. Within secondary metabolism, the metabolon structure has been shown to compensate for the surprising lack of substrate specificity for some of the enzymes. In addition, metabolons may also function in the sequestration of toxic intermediates (Mendes et al., 1992).
The above plant metabolons have recently been described in detail (Winkel, 2004; Jorgensen et al., 2005). We will therefore restrict our discussion to one or two specific examples that illustrate the main points. One of the earliest demonstrations of a metabolon in plants came from following the fate of radiolabelled compounds fed to sorghum microsomes. Using this approach, it was possible to demonstrate channelling of the highly toxic and labile intermediates N-hydroxytyrosine and p-hydroxyphenylacetonitrile (Moller & Conn, 1980). The presence of substrate channels has recently been demonstrated to occur in other pathways of secondary metabolism, notably the phenylpropanoid pathway. A wide spectrum of phenylpropanoids – including lignin, flavonols, anthocyanins and isoflavonoids – is produced from phenylalanine, with phenylalanine ammonium lyase (PAL) being the first committed step in these pathways. In most plant species, PAL is encoded by a small multigene family which may confer different specificity to channel flux towards the different classes of phenylpropanoids (Jorgensen et al., 2005). There is growing evidence to support channelling in these pathways. For example, discrete colocalisation of specific PAL isoforms (PAL1 and PAL2) with cinnamate 4-hydroxylase (C4H) has been observed (Achnine et al., 2004). On the basis of accumulated data, current models of phenylpropanoid biosynthesis suggest that the wide product range of these pathways is regulated by differential organisation of metabolons which are composed of different isoforms of the key biosynthetic enzymes and are associated with different downstream enzymes (Jorgensen et al., 2005).
A key feature of metabolon formation within the phenylpropanoid pathway is the association of enzymes with membrane structures. It is therefore interesting that other metabolic pathways have also shown to be associated with membranes. For example, a combination of proteomic, traditional enzymatic, cell biological and stable-isotope feeding experiments was used to provide corroborating evidence that the entire glycolytic pathway is associated with plant mitochondria by attachment to the cytosolic face of the outer mitochondrial membrane (Fig. 3). However, in contrast to the suggested functions of metabolons in secondary metabolism, it is unlikely that the function of this enzyme association with the mitochondrial membrane is product specificity. Perhaps the function of this microcompartmentation of glycolysis is to ensure sufficient pyruvate is provided directly to the mitochondria in the face of competition for glycolytic intermediates from other pathways such as the oxidative pentose phosphate pathway and amino acid biosynthesis (Giegéet al., 2003). As of yet, there is no direct evidence that substrate channelling occurs between the mitochondrially associated glycolytic enzymes. In general, however, this organisation of glycolysis, as well as the organisation of the TCA cycle into a metabolon, suggests that physical interaction between enzymes is prevalent in the respiratory pathway and can therefore be said to be an important feature of both primary and secondary metabolism. Further evidence for the importance of protein–protein interactions in the respiratory pathway comes from the discovery that the respiratory complexes of the electron transport chain themselves interact to form a supercomplex (Eubel et al., 2004).
It is likely that the use of postgenomic tools will allow the identification of far more functional complexes in the near future and given the influence such complexes exert on cellular metabolism it is vital that such information is incorporated into network models. While we can expect advances in our cataloging of protein-protein and protein–DNA interactions, it is worth noting that technologies capable of identifying for protein–metabolite interactions on a systematic scale are currently not yet available.
As Section III.3 implies, it is maybe too early to attempt large-scale network analyses in plants. That said, there are several pathways for which the relative importance of various levels of metabolic regulation has been defined. These pathways include the central metabolic pathways of glycolysis, the sucrose-to-starch transition, carbon–nitrogen interactions and amino acid metabolism. Given that the regulation of the sucrose-to-starch transition and carbon–nitrogen interactions have been comprehensively reviewed elsewhere (Stitt & Fernie, 2003; Geigenberger et al., 2004), we will concentrate here on detailing the regulatory hierarchy of glycolysis and amino acid metabolism.
The study of the metabolic regulation of glycolysis in plants has made dramatic progress since the initial isolation of plant aldolase in 1948. Plant glycolysis shares many features in common with that of animals and yeasts; however, it also contains many peculiarities. One feature of plant glycolysis is the presence of a complete or near-complete duplication of the pathway in the plastid. The two pathways are independently controlled both at transcriptional (Urbanczyk-Wochniak et al., 2003b) and kinetic (Givan, 1999) levels, suggesting that they are differentially regulated. Cytosolic glycolysis in plants also differs from that in animals by the presence of additional enzymes including pyrophosphate-dependent phosphofructokinase, nonphosphorylating glyceraldehyde 3-phosphate dehydrogenase and phosphoenolpyruvate phosphatase, all of which may be of particular importance under conditions of metabolic stress (Givan, 1999). Despite a wide number of reverse-genetic investigations of glycolysis, it has not yet been possible to pinpoint where the majority of the metabolic control lies in this pathway. Moreover, little is currently known concerning the transcriptional control of glycolysis, although several important observations have recently been made (Fernie et al., 2004a). Firstly, studies on transgenic plants exhibiting elevated sucrose cycling revealed an up-regulation of the entire cytosolic glycolytic pathway that is most probably mediated at the translational level – this coordinated change is very similar to that observed in several recent transciptomic studies (Wang et al., 2003; Wasaki et al., 2003). The fact that enhanced sucrose cycling places a large additional ATP demand on the cell makes it tempting to suggest that glycolysis in plants is demand driven as it is in E. coli (Koebmann et al. 2002), despite the fact that plant glycolysis is not allosterically regulated by ATP (Plaxton, 1996). The kinetic properties of most of the enzymes of glycolysis have long been established and are readily accessible through the BRENDA database (http://www.brenda.uni-koeln.de). Nevertheless, the recent discovery of post-translational modifications of the cytosolic pyruvate kinase (Tang et al., 2003) and the identification of glycolytic complexes suggest that our understanding of the regulation of even this cardinal pathway is far from complete.
The pathways of amino acid biosynthesis are considerably more complex than the unbranched pathway of glycolysis. However, the research of Gad Galili and coworkers has led to a considerable degree of understanding of the interregulation of lysine, glutamate and aspartate metabolism (Galili et al., 2001; Galili, 2002; Zhu et al., 2002). Lysine metabolism in plants is regulated both by the rate of its synthesis and its catabolism, with the latter operating via the alpha-amino adipic acid pathway, which is largely regulated by the first two enzymes of the pathway, namely lysine-ketoglutarate dehydrogenase (LKD) and saccharophine dehydrogenase (SDH) (Galili et al., 2001). These enzymes are encoded as a bifunctional protein. In Arabidopsis, the LKD/SDH gene encodes an additional monofunctional SDH enzyme and the two forms of SDH are only partially coordinately regulated in response to hormonal and metabolic stimuli, in keeping with the hypothesis that the monofunctional enzyme functions mainly to enhance the flux of lysine catabolism (Stepansky et al., 2005). In a series of elegant studies, it was shown that increasing lysine levels leads to increased levels of methionine, glutamine and asparagine, and a corresponding elevation in the conversions of glutamine to glutamate and asparagine to aspartate (see Fig. 4). Furthermore, via action on the key enzyme of lysine biosynthesis, dihydrodipicolinate synthase (DHPS), high lysine levels feedback-inhibit its synthesis. Moreover, the accumulation of glutamate leads to elevations in asparagine and glutamine content and also enhances the conversions of glutamine to glutamate and asparagine to aspartate. This regulatory loop is further complicated by the fact that the majority of the enzymes involved in the interconversion of these amino acids are under strict transcriptional regulation, and it is with some justification that Galili has termed lysine catabolism a stress- and developmentally super-regulated metabolic pathway (Galili et al., 2000). Different fluxes of lysine catabolism can apparently be achieved under different developmental and physiological programs via complex transcriptional and post-transcriptional regulation of the composite LKR/SDH locus which encodes three different proteins with different variants of SDH exhibiting different pH optima. The linker region between LKR and SDH, which plays a significant role in the regulation of the bifunctional LKR/SDH enzymes of plants (Zhu et al., 2002), exists neither in the bifunctional LKR/SDH genes of animals nor in the separate fungal LKR and SDH genes suggesting, that it evolved specifically in plants to regulate plant-specific processes.
The two case studies on glycolysis and amino acid biosynthesis discussed here demonstrate both the complexity inherent within plant metabolism and our fragmented current understanding of it. However, they also provide strong examples of how pathway regulation can be assessed in the context of the wider metabolic networks to which these pathways belong.
From the above examples, it is apparent that network thinking has long pervaded plant metabolic research, but tools that afford adequate coverage to allow the assessment of pathway function in a system context have been lacking. The development of rapid molecular profiling platforms would be expected to facilitate such an approach. In this section, we will discuss the impact that the relatively new technologies of transcriptomics, proteomics, metabolomics and comprehensive flux analyses have had on our understanding of plant metabolic networks.
Of the four technologies, transcriptomics is by far and away the most mature and offers genuinely comprehensive coverage of the majority of expressed transcripts. The technique is now a standard part of the molecular biologist's experimental arsenal and an ever-increasing data mountain has accumulated, much of which is accessible in public data warehouses. Although only a small number of published transcriptomic studies has been directed at metabolism, the results of such studies do allow some general conclusions to be drawn (Buckhout & Thimm, 2003). Generally, transcriptional regulation of gene expression is seen as the least important of the regulatory mechanisms that impinge upon metabolism. Nevertheless, it is clear that during development and in response to environmental stress there is coordinated regulation of genes that encode enzymes in the same pathways. The same is true during sugar starvation, in which coordinated repression of genes involved in carbohydrate metabolism was observed (Thimm et al., 2004). It is also apparent that during sugar starvation metabolism can be re-organised to allow flexible use of carbon skeletons from alternative sources. Coordinated regulation of respiratory pathways is also seen following genetic intervention in tomato (Baxter et al., 2005). One can take two views of such coordinated gene expression. First, that it is an active process to drive pathway fluxes at different rates or in different directions. Second, that it is a homeostatic process designed to bring metabolites back to optimal levels following a metabolic disturbance. Given the rapidity of metabolic flux changes driven by altered substrate supply and the relatively slow process of changing enzyme abundance through altered gene exression, the latter is perhaps more likely. Thus, the fact that there is evidence that the expression of enzymes through a pathway actually changes in a sequential manner (the first enzyme changing first and by the greatest amount) should be seen not as a mechanism to optimise response time (Zaslaver et al., 2004) but rather as the most efficient way to bring metabolites concentrations back into line following metabolic disturbance.
Of course, one has to be extremely cautious in interpreting the results of microarray experiments, as it has been frequently observed that transcript abundance does not translate into a correlated change in protein amount. A more direct and systematic analysis of protein abundance is required. Mark Stitt's group has used a high-throughput enzyme assay platform as a proxy for the metabolic proteome and was able to characterise changes in the activity (abundance) of proteins of central carbohydrate metabolism during the diurnal cycle and during sugar starvation (Gibon et al., 2004). Interestingly, when compared with the equivalent transcriptomic changes, protein changes were highly damped and often out of phase. More sophisticated proteomic platforms (based on mass spectrometry in combination with electrophoretic or chromatographic fractionation) have the potential to interrogate a much greater proportion of the proteome. However, the experimental and technical demands of quantitation within this approach (Aebersold & Mann, 2003) have meant that relatively few quantitative proteomic studies have been attempted to date. The most rigourous study of this nature in plants used greening maize as a system to investigate plastid biogenesis (Lonosky et al., 2004). Again, evidence of coordinated regulation of protein abundance within metabolic pathways was observed. However, a more complex picture emerges as one goes through the developmental sequence. For example, coordinated regulation of the enzymes of photosynthetic carbon assimilation is observed during early development, but the pattern of the same proteins diverges later in development.
One of the major problems in accurately defining the plant metabolic networks is the fact that the plant cell is extensively compartmented. Proteomics is making a major impact upon our characterisation of the organellar localisation of pathways, providing essential evidence that justifies placing a particular enzyme isoform in a given subcellular compartment (Millar et al., 2004; van Wijk, 2004). Proteomic studies have also highlighted the extent to which proteins can be targeted simultaneously to more than one compartment (Chew et al., 2003). Perhaps the area in which proteomics has the greatest potential to unlock aspects of metabolic control is in the systematic analysis of post-translational modifications of proteins (Mann & Jensen, 2003). From a technological point of view, the phosphoproteome has proved the most accessible and already some systematic characterisations of phosphorylation of plant proteins have been undertaken (Bykova et al., 2003; Nuhse et al., 2004). With the appropriate technology, it is even possible to capture the dynamics of protein phosphorylation (Blagoev et al., 2004). The other main post-translational modification that has been characterised using a proteomic approach is redox modification of protein thiols by thioredoxin (Motohashi et al., 2001; Yano et al., 2001; Balmer et al., 2003; Balmer et al., 2004; Marchand et al., 2004; Wong et al., 2004; Rey et al., 2005). Although such proteomic analyses will undoubtedly improve our understanding of post-translational regulation of enzymes across the metabolic network, it is important to realise that the proteomic studies provide only a list of proteins that are post-translationally modified. It remains to be established whether those modifications actually have regulatory significance for protein function or not.
Metabolomic approaches are currently trailing somewhat behind transcriptomic and proteomic approaches with respect to network studies, largely due to their compromised coverage (Weckwerth & Fiehn, 2002; Fernie et al., 2004b). That said, several important observations have been made on the basis of correlative behaviour between metabolites. In an important case study, Arkin and coworkers demonstrated that metabolic pathways can be determined kinetically by the monitoring of correlative behaviour between only a handful of metabolites (Arkin et al., 1997). However, despite the convincing case made by the authors, and proof of functionality in the identification of signal transduction cascades (Sontag et al., 2004), it is perhaps telling that such approaches have as yet not facilitated the identification of multiple metabolic pathways. That is not to say that this approach is of limited usefulness; the use of metabolite–metabolite correlations, in a manner analogous to co-response analysis (Hofmeyer et al., 1996), has revealed several important aspects of pathway and network behaviour (Steuer et al., 2003; Camacho et al., 2005). The analysis of the intracellular concentrations of metabolites in yeast revealed phenotypes for mutations of proteins active in metabolic regulation (Raamsdoonk et al., 2001). Quantification of the change of several metabolite concentrations relative to the concentration change of one selected metabolite was demonstrated to reveal the site of action, in the metabolic network, of silent genes. In addition, early metabolite profiling studies identified hyperbolically related metabolite–metabolite pairs that were consistent with known feedforward and feedback mechanisms of regulation (Roessner et al., 2001). However, it is important to note that, to date, no novel regulatory mechanisms have been kinetically confirmed following such analyses. The analysis of the global data sets accrued by metabolite profiling have additionally been used to define the influence of specific proteins on metabolism during development (see, for example, Roessner-Tunali et al., 2003) and to discriminate the apparently silent phenotype of potato plants deficient in an isoform of sucrose synthase (Weckwerth, 2004).
In contrast to steady-state metabolite analysis, the contribution of dynamic flux analyses to network thinking has been visible for many years. In fact, arguably, flux is the ultimate expression of the metabolic network (Koffas et al., 1999). Until recently, flux analysis was severely restricted in coverage. However, recent technical advances have allowed a broadening of the information accessible via high-throughput flux analysis (Roessner-Tunali et al., 2004; Sauer, 2004; Sriram et al., 2004). As mentioned earlier (Section I), flux measurements have been used in systems biology in the framework of metabolic control analysis for many years. Furthermore, by analogy to experimentation carried out in the microbial field (Hellerstein & Nesse, 1999), network analyses have also been carried out within this context in plant systems. Whilst several such analyses have recently been reported, two are of particular note. In the first, a comprehensive analysis of fluxes during three different stages in the growth cycle of tomato cells was made (Rontein et al., 2002), whilst the second was concerned with analysing the compartmentation of the major pathways of carbohydrate oxidation in B. napus embryos (Schwender et al., 2003). The tomato cells study utilised nuclear magnetic resonance (NMR) spectroscopy to derive cellular fluxes for central metabolism as well as measuring the accumulation of polymeric components of the cell. This study revealed that the fluxes through the central pathways of carbon oxidation were remarkably constant, whereas effluxes into pathways of polymer synthesis were highly variable (Rontein et al., 2002). In a similar study, Schwender and coworkers tackled the thorny issue of compartmentation by culturing B. napus embryos on variously labelled stable isotopes of glucose and measuring the label composition of amino acids, lipids, sucrose and starch (Schwender et al., 2003). The cumulative data from this study were then used to verify the reaction network that was distributed between the cytosol and plastid and via overdetermination of key parameters, the relative fluxes of the compartmented pathways were reliably computed. The above examples indicate that tracer labelling experiments have great utility not only in confirming network structure but also in the analysis of metabolic regulation in their own right.
The preceding sections have concentrated on the use of genomic tools in isolation. However, it is becoming increasingly clear that integrated analysis will be necessary in order to maximise our understanding of metabolic networks (Sweetlove et al., 2003; Ratcliffe & Shachar-Hill, 2004; Oksmann Kaldentey & Saito, 2005). Such analyses have been carried out at high frequency in microbial systems (Even et al., 2003; Hellerstein, 2003; Sauer, 2004; Stephanopoulos et al., 2004) and are beginning to be attempted in plants (Suzuki et al., 2002; Urbanczyk-Wochniak et al., 2003a; Hirai et al., 2004; Fridman & Pichersky, 2005). To date, the major utility of these approaches has been the analysis of gene function, important examples of this being the identification of genes of the flavonoid biosynthesis pathway (Tohge et al., 2005) and of triterpene biosynthesis (Suzuki et al., 2002). Whilst such targeted analyses are of great importance, the global nature of the profiling technologies clearly also allow the unbiased analysis of correlations between genes, proteins and metabolites. Exactly such an approach has been used in the identification of candidate genes for metabolic engineering both in bacterial (Askenazi et al., 2003) and plant systems (Goossens et al., 2003; Urbanczyk-Wochniak et al., 2003a). Since integrative analysis is in its infancy it remains likely that further advancements will be made in the near future.
The parallel advances in our ability to access information on the steady-state levels of the various molecules of the cell and the development of network biology combine to make it an exciting time to study the circuits that underpin metabolism. Although it is clear that additional experimental data sets are required if we are to undertake a comprehensive systems analysis of the plant metabolic network, particularly given how extensive compartmentation (and for that matter microcompartmentation) is in plant cells, the (re-)emergence of network thinking in metabolism clearly has great potential. Because metabolism is coordinated at many levels, the analysis of all regulatory levels is a prerequiste for comprehensive network analysis. To date in plants, this has only been attempted for a handful of metabolic pathways or subnetworks. In the short term, it seems likely that the extension of such modular approaches will allow the identification of common regulatory motifs and as such enhance our understanding of metabolic regulation. Given that our ability to rationally manipulate plant metabolism remains relatively rudimentary, the need to understand metabolic network regulation in plants is likely to be of increasing importance for metabolic engineering. The fact remains, however, that network analysis is inherently difficult and it is likely that many hurdles will need to be overcome before the study of metabolic networks reveals the true complexity of metabolic regulation within plant cells.
LJS acknowledges financial support from the Biological and Biotechnological Sciences Research Council, UK. ARF acknowledges financial support from the Max-Planck-Gesellschaft, the Bundesministerium für Bildung und Forschung and the Deutsche Forschungsgemeinschaft.