The regulatory and metabolic networks that rule biodegradation of pollutants by environmental bacteria are wired to the rest of the cellular physiology through both transcriptional factors and intermediary signal molecules. In this review, we examine some formalisms for describing catalytic/regulatory circuits of this sort and advocate the adoption of Boolean logic for combining transcriptional and enzymatic occurrences in the same biological system. As an example, we show how known regulatory and metabolic actions that bring about biodegradation of m-xylene by Pseudomonas putida mt-2 can be represented as clusters of binary operations and then reconstructed as a digital network. Despite the many simplifications, Boolean tools still capture the gross behaviour of the system even in the absence of kinetic constants determined experimentally. On this basis, we argue that still with a limited volume of data binary formalisms allow us to penetrate the raison d'être of extant regulatory and metabolic architectures.
In their natural settings, changes in environmental conditions often compromise survival of any given microorganism unless it responds and adapts to shifting physicochemical and nutritional scenarios (McAdams et al., 2004). Soil bacteria constitute a remarkable example of environmental adaptation as they are able to colonize a large number of niches and to deal with many variable conditions (Cases et al., 2003). The reasons for this extraordinary flexibility reside not only in the catalogue of regulatory and structural genes encoded in their genomes but also, more decisively, in the way regulatory networks sense external conditions and adjust cell physiology to changing circumstances. Such a sensorial ability is reflected in the repertoire of transcriptional factors (TFs) available in the genomes of archetypical soil bacteria (e.g. Pseudomonads) for controlling expression of both metabolic and stress–response functions. In this respect, the genomic complement of generalist microorganisms that thrive in natural environments has a much larger share of regulatory genes encoded than counterparts that inhabit stable niches (e.g. endosymbionts; Cases et al., 2003; Dos Santos et al., 2004; Konstantinidis and Tiedje, 2004). Needless to remark that TFs do not act in isolation but are hierarchically connected (Shen-Orr et al., 2002), allowing the cell to integrate different stimuli and build proper responses for prevailing under new settings. The overall flow of signal propagation through regulatory networks is sketched in Fig. 1. Note that for the rest of the article we refer to such networks as biological devices composed of connected nodes in each of which given inputs are converted into distinct outputs, which then become the inputs of other downstream nodes (de Las Heras et al., 2010). In this respect (and for the sake of illustration of this concept) the material nature of such inputs/outputs is not important, provided that they can be computed at the corresponding nodes (Istrail et al., 2007). As discussed below, this simplification is key for merging transcriptional and metabolic events with the same formalisms.
Although the main function of regulatory networks is signal integration, the specific architectures of their constituents endow the corresponding systems with distinct dynamic properties that make a difference in the final response. The structure of network motifs (for example feedback and feed forward loops, multi-input modules andmany others; Silva-Rocha and de Lorenzo, 2010) enable them not only to process predetermined inputs into equally preset outputs. Also they determine important properties such as response time, shape of the response generated and pulsing or monotonic outcome, independently of specific parameters (Mangan and Alon, 2003; Mangan et al., 2003). Intricate integration phenomena encompassing extracellular compounds, intracellular metabolic sensors and signal propagation by small molecules often appear in the regulatory networks that control biodegradative pathways for recalcitrant compounds (Shingler, 2003). This makes sense, as bacteria that inhabit polluted sites have to make decisions between different nutrients-to-be on the basis of many endogenous and exogenous factors (i.e. compound availability, physiological state of the cells, the flux of carbon through the metabolism, final electron acceptors, physicochemical circumstances, etc.; Rojo, 2010).
The most characterized cell-wide regulatory network is that of the central metabolism of carbon in the model organism Escherichia coli (Kotte et al., 2010). In this case, the hierarchy of global and specific TFs that control expression of the genes encoding large metabolic blocks typically originates network topologies that are optimal for processing the signals mentioned before and bring about the most advantageous physiological result (Balazsi et al., 2005; Kotte et al., 2010). The bottom line is, in any case, that easy-to-degrade carbon sources are consumed first over compounds more difficult to metabolize (Bruckner and Titgemeyer, 2002). In contrast nutrient choices in environmental bacteria are not, for example, between a palatable glucose and a less edible glycerol, but between carbon and nitrogen compounds with unusual molecular structures that often act themselves as chemical stressors (Velazquez et al., 2005). In these cases, it cannot come as a surprise that the corresponding regulatory architectures become more intricate. The set of TFs that control biodegradative pathways has not only to recognize a xenobiotic or recalcitrant compound as a nutrient-to-be, but also to ensure that the trade-off between metabolic gain and stress endurance is not detrimental to the general cell physiology (Shingler, 2003; Dominguez-Cuevas et al., 2006).
A growing approach to address the functions and properties of regulatory networks involves the formulation of models that translate the molecular interactions known for a given system into a set of equations that afford a simulation of the entire lot of physical and functional interplays (Herrgard et al., 2004; Karlebach and Shamir, 2008). During the last few years, a number of methods have been implemented and validated for building and simulating such models (Karlebach and Shamir, 2008). The initial step for analysing cellular networks involves the elaboration of a relational diagram comprising all components of the system and as much data as possible on the interactions among them, whether physical or functional (Fig. 1). In typical metabolic networks, the nodes are the substrates and products on which the enzymes act on, while the edges represent enzymatic activities themselves. Merely descriptive models with nodes (components) and edges (connection between nodes) can then be enriched with stoichiometric coefficients, kinetic constants and thermodynamic information (e.g. reversibility). The list of enzymatic activities can be grossly derived from genomic annotations based on homology search (Notebaart et al., 2006; DeJongh et al., 2007; Thiele and Palsson, 2010). Such lists can then be refined by manual curation, for instance by looking for missing activities in the corresponding metabolic map (Reed et al., 2006). A large number of additional problems can appear at every step of genome-based metabolic reconstructions (Nogales et al., 2008; Puchalka et al., 2008; see Feist et al., 2009 for a compendium). Experimental kinetic and thermodynamic data have to be determined either by direct biochemical analyses or indirectly by means of parameterization algorithms that convert omics data intoapparent kinetic constants (Jaqaman and Danuser, 2006; Breitling et al., 2008). The volume and quality of such wet information is highly variable depending on the organism, and in many cases it is not available at all. Besides, this information often pertains to in vitro results, which reflect poorly true intracellular behaviour. How to make then reasonable metabolic models in view of the frequent dearth of data? To overcome this difficulty, genome-scale metabolic reconstruction and simulations can be made through constraint-based approaches (like flux-balance analysis), which do not need such information (Feist et al., 2009; Thiele and Palsson, 2010). Data from different experimental omics approaches, such as metabolomics, fluxomics, transcriptomics and proteomics, provide additional information with which to constrain the models and thus to reduce the solution space and increase accuracy (Herrgard et al., 2006; Feist et al., 2009).
In contrast to metabolic counterparts, regulatory networks capture information on the influence that particular genes (i.e. those encoding TFs) exert on the expression or activity of others. As a consequence, the nodes in this case are composed of genes (often assimilated to their encoded proteins), while the edges express regulatory connections, e.g. either activation or inhibition (Fig. 1). Regulatory diagrams of this sort facilitate the visualization of coordinated regulatory effects, which appear as sets of TFs acting on the same target (Schlitt and Brazma, 2007). Unlike metabolic scenarios discussed above, the reconstruction of the regulatory network of a given organism on the mere basis of its genomic sequence is a very challenging task (Goelzer et al., 2008). While metabolism (specially the core of central enzymatic reactions) is relatively conserved in many different organisms, similar regulatory outcomes can originate from unrelated TFs and regulatory modules (Price et al., 2007). Beyond a few model organisms, the volume of information on regulatory interactions is very scarce, making it very difficult to translate the knowledge obtained for one bacterium into others, even in the case of close species. Bona fide orthologues of typical E. coli's TFs such as the catabolite regulatory protein (CRP) or the integration host factor (IHF) have been found to govern entirely different sets of functions in Pseudomonas putida through an evolutionary exaptation process (Milanesio et al., 2011). The targets and functions of orthologous TFs are often not the same in different species (Hale et al., 2007), thus making it necessary to examine regulatory interactions experimentally on a case-by-case basis.
A simple alternative to display and simulate genetic circuits when little or no information on transfer functions between nodes is accessible involves the adoption of Boolean concepts. Binary logic is in this case sufficiently instrumental for describing the states of the components of the biological system under scrutiny. This type of logic analysis is possible because signal propagation through a biological network (and the resulting output) is dependent on the way the molecular components are connected (Faure and Thieffry, 2009; Morris et al., 2010). This is the same principle of electronic logic devices, the calculation ability of which depends on the way the transistors are joined. In Boolean networks, the status of any specified gene is characterized by only two possible values (true or false: 1 or 0) that reflect whether that node of the circuit is active or not. Regulatory interactions can then be accurately entered in the network as logic gates that execute Boolean functions such as AND, OR, NOR, etc. (Silva-Rocha and de Lorenzo, 2008). This allows describing expression of any gene as a result of the presence or absence of other genes and small molecules that act as regulators (Buchler et al., 2003). Such a descriptive language enables the layout of dynamic and deterministic models in which known inputs are processed into just as discrete outputs (Hasty et al., 2002). In this respect, the sole architecture and hierarchy of network components endows the system with intrinsic signal computation capacities at the nodes of the circuit and fixes a signal propagation itinerary through the entire set-up. These features are ultimately shaped by connectivity and the sign of the interplay between the interacting components of the network. For instance, in a feed forward loop module (FFL; Shen-Orr et al., 2002), in which two TFs activate directly and indirectly the same target gene, the final shape of the response curve is very different depending on whether both regulators are equally efficient in the activation of the third gene or they have to cooperate for generating the output (Mangan and Alon, 2003). These alternative FFL scenarios can be easily translated into different classes of Boolean operators. Specifically, the first instance is equivalent to an OR gate acting on both TFs (i.e. the presence of just one TF is sufficient to bring about the final effect), while the second corresponds to an AND operator (both TFs are necessary to activate the target gene). In this way, logic operators (gates) describe rigorously the sign of the relationship between interacting components of the regulatory system and fix the outcome resulting from these interactions. Values of 0 or 1 can be assigned to both the inputs and outputs of the circuit, which becomes a signal computation device reminiscent of those made with transistors (Silva-Rocha and de Lorenzo, 2008). This is not yet a quantitative description of the system, but it allows penetrating its inner logic and move further than the typical arrows/hammers depiction of regulatory networks.
Integration of regulatory and metabolic logic in the same Boolean circuit
Two extreme abstractions have to be adopted for conversion of any regulatory circuit into a logic network. First, the components can only hold either of two state values (0 and 1). Second, the material nature of the same components is entirely disregarded as long as they do their job in computing set inputs into determined outputs. Although not always explicit, the near exclusive components of networks of this sort consist of TFs and inducing signals (whether exogenous molecules or physicochemical conditions). In real cells, however, regulatory devices operates on the background of an active metabolism. Such biochemical activity is not only controlled by dedicated genetic circuits, but also the enzymes and substrates/products can physically or functionally interact with TFs, creating regulatory interplays between the transcriptome and the metabolome (Kotte et al., 2010). Importantly, the organization of an enzymatic network can also be formalized as a whole of logic gates (i.e. biochemical computing) in which both the inputs and the outputs consist of enzymes and metabolites rather than TFs and inducers (Niazov et al., 2006; Pita et al., 2009; Katz and Privman, 2010). Since the activity states of the components of such enzymatic systems are equally abstracted to binary values 1 and 0, it is then perfectly feasible to merge regulatory and biochemical networks in the same logic circuit and examine its structure as a unique biological object (Johnson et al., 2004). An example is described in more detail below regarding the logic architecture of the entire regulatory/enzymatic network for environmental m-xylene biodegradation. Note that dual inputs that can be processed by a node/gate in a merged network of this sort may include (i) two TFs, (ii) one TF and one metabolite and (iii) one enzymatic activity and one substrate, for instance a nutrient or a metabolite. By the same token, outputs might consist of proteins (whether TFs or enzymes) and small molecules (reaction products, intermediate metabolites and signalling chemicals, e.g. autoinducers). This type of abstractions allow us to uncover emergent properties of biological networks which are not noticeable if the regulatory and the enzymatic connections of the same system are addresses separately, let alone if the properties of each of the components are examined out of their context.
From digital networks to workable models
While elaboration of a logic map of a natural regulatory network of the sort just described might be the first step in the global analysis of a whole system, the final objective of any modelling is the formulation of equations that represent all key interactions between components. Boolean approaches can be enriched with methods for incorporating stochastic effects (Shmulevich et al., 2002), but digital genetic/metabolic networks and their cognate simulations ultimately ignore kinetic parameters. As a result, all components of the simulated system update their state in a synchronous fashion, what is evidently far from biological reality. Fortunately, more general logical networks have been proposed that are at the same time asynchronous and multi-level (Faure and Thieffry, 2009). A simplified strategy to this end involves the use of piecewise-linear (PL) approximations (de Jong et al., 2003). The underlying concept in this case is that the switch-like behaviour of gene regulation affords to grossly match the non-linear function of ordinary differential equations (ODEs) to step functions that reflect downstream gene expression at a given concentration of the corresponding upstream regulator:
This equation indicates that the synthesis of product i is a function of the presence of the effector j. In this equation, s+ is a step function, a Boolean operator that sets to one if the concentration of j (xj) is above a particular threshold (θj), and it is zero if not. The synthesis of i produces at a rate given by ki when xj ≥ θj, and does not take place otherwise. In this case, j is an activator of the synthesis of i. A repressor can be expressed by a negative step function:
which sets to one when xj < θj, and it is zero when xj ≥ θj. The synthesis of a particular compound results from the combination of different regulation functions, each involving a different effector, for instance:
where the production of i is regulated by the activator j and the repressor k. In this way, it is possible to model all possible transcriptional and metabolic interactions in the system, determining the production of particular compounds in function of the presence or absence of some others. It is possible to set different thresholds to the concentration of a particular compound, if it regulates different reactions at different concentrations. For instance, if effector j regulates two reactions, we can set the constraint θj1 < θj2, to indicate that reaction 1 is regulated by a lower concentration of j than reaction 2. These constraints are known as threshold inequalities. The levels/activities of each of the molecular species (genes, proteins, RNAs, etc.) that participate in the network are represented by continuous time variables. Given that they represent concentrations, the variables cannot take negative values. Therefore, the concentration of a particular compound is determined by its production rate (expressed by ki), and its degradation rate (gi), which is a strictly positive function. With the combination of positive and negative step functions, it is possible to model all possible regulatory interactions in the system, no matter their complexity. This approach allows the simulation to proceed without any knowledge of kinetic constants between network nodes, and affords description of the Boolean networks described above as sets of PL equations that are reminiscent of actual ODEs. Furthermore, recent improvements in modelling logic circuits using PL approximations (Baldazzi et al., 2010) allow entering differences in the timescale of the relevant molecular events based on judicious biological reasoning. For instance, the time necessary for a metabolite to bind or be released from a cognate TF has to be inevitably shorter than the time it takes to transcribe an entire gene (Alon, 2006; Mayo et al., 2006). The result of formalisms of this sort is that the itinerary of inputs and outputs through the network can be displayed as a coarse continuous flow rather than a series of discrete jumps between binary states. Instead, step functions can approximate sigmoids in various instances (de Jong et al., 2003; Baldazzi et al., 2010) thus providing a most useful simplification without loss of accuracy in the eventual simulation of the corresponding network. PL models thus keep the expressive power of logic networks, but at the same time they are well grounded in the classical modelling framework based on differential equations.
Boolean description of m-xylene biodegradation by P. putida mt-2: the TOL logicome
Bacteria that colonize sites polluted by recalcitrant and xenobiotic chemicals offer a repertoire of regulatory and catabolic devices (Tropel and Van Der Meer, 2004; Phale et al., 2007; Carmona et al., 2009). Comparative studies of the transcriptional networks that control expression of pathways for degradation of recalcitrant chemicals reveal an extraordinary – and largely inexplicable – diversity of regulatory architectures (Tropel and Van Der Meer, 2004). One of the most conspicuous cases appears in the so-called TOL network, which regulates a complex pathway for the degradation of toluene and m-xylene in the soil bacterium P. putida mt-2 (Ramos et al., 1997). The TOL pathway is encoded by a mobile, self-transmissible plasmid called pWW0, which encodes the enzymes necessary for conversion of m-xylene into pyruvate and acetaldehyde, i.e. transformation of otherwise recalcitrant substrates into central metabolites. As shown in Fig. 2, the TOL network includes two transcriptional regulators (XylR and XylS) that control expression of two cognate operons. These determine subsequent steps of the transformation of m-xylene (m-xyl) to 3-methylbenzoate (3MB, upper pathway) and from 3MB to intermediates of the tricarboxylic acid cycle (TCA, lower pathway). The TOL network is wired to the rest of the cell by a number of chromosomally encoded factors, including the histone-like proteins IHF (Holtel et al., 1990; Abril et al., 1991; de Lorenzo et al., 1991) and HU (Perez-Martin and de Lorenzo, 1995b; 1997) as well as four sigma factors (σ70, σ54, σ38 and σ32) and additional regulatory proteins TurA and PprA (Rescalli et al., 2004; Vitale et al., 2008). All these connect expression of the TOL genes to both internal signals (growth phase, energy charge, ppGpp) and external stimuli (alternative C sources, temperature, N compounds). The whole of enzymes and regulators encoded in the plasmid forms an autonomous molecular network of a relatively small dimension that is suitable for the type of Boolean formalisms mentioned above.
Soft description of the TOL regulatory circuit
The TOL system is a biochemically separated entity from the rest of the host's metabolism that encompasses the whole of metabolic and regulatory genes required for complete degradation of m-xylene into intermediaries of the central pathways. Figure 2 summarizes virtually all known facts about the regulation of the system. In the absence of the substrate of the pathway, expression of both the upper and the lower pathways is entirely shut down due to the inactivity of their cognate promoters Pu and Pm respectively. The corresponding activators, XylR and XylS, are present by virtue of their expression through their divergent promoters Pr and Ps, but in an inactive form (XylRi, XylSi). Transcription of xylRi is maintained approximately constant through a typical negative feedback loop, while that of xylSi is kept constitutively low through a weak housekeeping promoter. The situation changes drastically as soon as cells are exposed to m-xylene. This inducer initiates a stepwise sequence of regulatory and metabolic events that start at the Ps–Pr region (where the maximum concentration of regulatory elements occurs) and is propagated through the entire circuit. The process starts with the binding of m-xylene to XylRi for production of an active form, XylRa. This causes two effects: (i) activation of the σ54Pu promoter and subsequent expression and activity of the upper pathway for conversion of m-xylene to 3MB, and (ii) activation of the Ps1 promoter and overexpression of xylS, which brings about a species of this factor (hyperproduced XylS, named XylSh), which is able to activate by itself the lower, meta-operon. As the process goes on, 3MB appears in the system as a product of m-xylene conversion by the upper pathway. This aromatic compound can now bind what remains available of inactive XylSi and switches this regulator into a form (XylSa) that – similarly to XylSh – is able to activate Pm (and thus the lower pathway) as well. Once both the upper and the lower genetic/biochemical pathways are in operation, the head substrate m-xylene is eventually converted into central metabolites (TCA). The result of the signal propagation cycle that starts with m-xylene as input is therefore the production of pyruvate and acetaldehyde as outputs of the whole process. The catabolic capacity of the system is fixed not only by substrate concentrations, but also by a large number of physiological control mechanisms (e.g. catabolic repression, growth phase control and others; Holtel et al., 1994; Cases et al., 1999; del Castillo and Ramos, 2007) that adjust the outputs of each of steps to the growth or stress conditions of the cells. These signals are entered through numerous host factors and endogenous signal molecules: IHF, HU, TurA, PprA, Crc, ppGpp, sigma factor competition, Entner–Doudoroff metabolites and perhaps several others (de Lorenzo et al., 1991; Perez-Martin and de Lorenzo, 1995b; Gallegos et al., 1996; Carmona et al., 2000; Rescalli et al., 2004; Aranda-Olmedo et al., 2005; Dominguez-Cuevas et al., 2005; Vitale et al., 2008).
The regulatory narrative just spelled out (Fig. 2) is largely based on quantitative measurements of lacZ (β-galactosidase) fusions to each of the promoters at stake. Alas, the lack of standardization of the procedures, let alone the absence of any formal parameterization of the transfer functions between one step of the process and the other (Endler et al., 2009), prevents any systems-level comprehension of the circuit as a whole. However, the information available on different regulatory and metabolic parts is sufficient to assign given on/off states to each of the nodes at various stages of signal propagation. In this context, the sections below account for the translation of each of the four regulatory knots that operate on the TOL circuit into a formal description of the corresponding biological functions using the tools of Boolean analyses (Fig. 3A).
De-construction of the Ps–Pr regulatory node into three autonomous logic units
As mentioned above, the Pr–Ps region is the one where the regulatory programme of the TOL system intensifies, as it encompasses the genes of the two regulators of the system (xylR, xylS) connected through two sets of overlapping divergent promoters. This region has been the only thus far amenable to dynamic modelling (Koutinas et al., 2010; 2011). As shown in Fig. 2, tandem σ70 promoters Pr1 and Pr2 (Inouye et al., 1985) express the master regulatory xylR gene, the product of which (XylRi) represses its own synthesis (Bertoni et al., 1997). For the sake of this abstraction, both Pr1 and Pr2 are considered a single σ70 promoter, Pr. The binding of m-xylene (input) to XylRi (input) for production of XylRa (output) can be formalized as an AND gate, the product of which can also repress xylR (Fig. 3B). This can be represented as separate negative feedback loops that shape a NOR gate, the output of which is the input for another AND gate with σ70 as the second input for Pr. On the other hand, i.e. expression of xylS, involves two transcription-promoting devices, one of them low-constitutive (Ps2) and the other dependent on σ54 and inducible by XylRa. The unsettled controversy on whether the low-constitutive expression is Ps is due to a bona fideσ70 promoter (Gallegos et al., 1996) or it is a residual activity of a XylRa-dependent σ54 promoter (Perez-Martin and de Lorenzo, 1995a) makes no difference for our Boolean analyses. Ps1 and Ps2 are the operative names given in any case to each of the two devices that promote xylS expression either in an inducible or in a constitutive manner respectively. Unlike the case of Pr, the two Ps promoters have to be formalized separately, because they originate distinct outputs. As shown in Fig. 3C, the only input of Ps2 is the housekeeping σ70, and its only output is the inactive XylSi, an occurrence that can be represented as a YES gate. In turn, XylSi and 3MB (the product of the upper pathway; Ramos et al., 1997) form an AND gate that originates active XylSa as its output. Moreover, XylRa activates Ps1 with the concourse of σ54 and HU. The fact that this action requires three necessary inputs (XylRa, σ54, HU) is not an obstacle for our Boolean analysis, because we can disclose a three-input AND gate as the sum of two connected binary gates. The output of this action is hyper-expressed XylSh, a form of the factor that is able to activate Pm in the absence of 3MB (see above; Mermod et al., 1987). Note that for this analysis, we consider XylSh and XylSi as separate TFs, equally competent for activating Pm. While this distinction may not be mechanistically accurate (Dominguez-Cuevas et al., 2005) it allows us to separate the activation of Pm resulting from the master regulator XylR from that brought about by the formation of 3MB after the action of the upper pathway enzymes on m-xylene (see below). It has been possible to merge both Boolean approaches with dynamic modelling for representing the Pr/Ps node, as the lack of information about the TFs makes it necessary to use (if nothing else) a 0/1 approach to describe the interaction between, e.g. σ70, σ54, and HU and the target promoters (Koutinas et al., 2010; 2011).
Formalization of regulatory events at the upper and lower TOL operons
As shown in Fig. 2, the upper TOL operon encodes three enzymatic activities necessary for the conversion of m-xylene into 3MB: xylene monooxygenase (xylAM), benzaldehyde dehydrogenase (xylB) and benzyl alcohol dehydrogenase (xylC). The upper pathway bears also other extra genes with uncertain roles (Greated et al., 2002). The operon is expressed from the σ54-dependent promoter Pu, which requires XylRa and IHF for transcription initiation (Calb et al., 1996; Bertoni et al., 1998). For the model, we consider the Pu promoter to act as a combination of two AND gates (Fig. 4B) in which XylRa, σ54 and IHF are the inputs and the entire catalytic complement of the upper pathway abstracted as a single output. The second metabolic operon (encoding the lower TOL pathway) encodes a set of enzymatic activities able to convert 3MB into 3-methyl catechol (xylXYZL), followed of meta-cleavage of the di-hydroxylated ring for formation of a semialdehyde (xylE) and eventual routing of this product into the TCA cycle (xylFJK). As was the case with the upper pathway, these core activities go together with other genes of unknown function (Greated et al., 2002). And as before also, they are considered model-wise as the single output of the Pm promoter (Fig. 3B). Transcription initiation in this case can be elicited by any of the active forms of XylS (XylSa or XylSh; Inouye et al., 1981; Mermod et al., 1987). Moreover, the core RNAP that activates Pm can employ any of the three sigma factors of the σ70 family, i.e. the housekeeping σ70 (RpoD), the stationary-phase σ38 (RpoS) and the heat shock σ32 (RpoH; Gallegos et al., 1996; Marques et al., 1999; Dominguez-Cuevas et al., 2005). This scenario for Pm regulation can be depicted as a gate with five possible inputs and one single output, the binary computation of which can be broken down to discrete OR and AND gates as shown in Fig. 4C.
3MB is the endogenous signal carrier through the domains of the TOL network
The primary input to the TOL system is m-xylene, which once computed by the logic gates of the Pr–Ps region (see above) generates two active TFs as distinct outputs, XylRa and XylSh. These can then act as inputs of the Pu and the Pm promoter respectively. Up to that point, all actions are regulatory. However, the outputs of both the upper and the lower operons are enzymatic activities, not regulators. We can also translate such activities as useful inputs in the network. The upper pathway and m-xylene are inputs of an AND gate which has 3MB as an output signal (Fig. 4D). In turn, 3MB can form AND gates with both XylSi (to become XylSa, see above) and the lower TOL pathway (to produce TCA cycle intermediates). Note that under this scheme, the endogenously produced 3MB becomes the key signal carrier molecule that can be read by both regulatory and metabolic nodes of the entire TOL network, hereby connecting the activities of the upper and lower domains of the system. Finally, the eventual output of the whole metabolic and regulatory system is the complete degradation of 3MB into pyruvate and acetaldehyde (Burlage et al., 1989; Assinder and Williams, 1990; Ramos et al., 1997), which are channelled into the central metabolism, e.g. the TCA.
The TOL logicome
The result of connecting all nodes of the TOL system in the shape of binary Boolean gates was a body of logic operations (a logicome) able to compute defined environmental signals into fixed biological responses by means of both regulatory and enzymatic actions (Istrail and Davidson, 2005; Istrail et al., 2007). The logicome (Fig. 5) includes one exogenous (m-xylene) and six endogenous inputs (IHF, HU, σ70, σ54, σ38 and σ32), one inborn signal carrier molecule (3MB) and one single outward output (TCA). For the sake of simplicity, we have not considered as inputs other nutritional signals (presence of alternative C sources) or environmental circumstances (temperature, growth phase) known to affect the performance of the system (Ruiz et al., 2004). We thus assume that that cells that bear such a logicome undergo favourable growth conditions for optimal expression of TOL activities.
The resulting logic representation of the TOL network encompassing all known molecular interactions provides a low-resolution but still quantitative view of the system. Circuit-like formalisms of this sort have a number of intrinsic advantages over the customary arrows/hammers often used to sketch biological networks. First, logic circuits underscore differences in the signal propagation itineraries between components of the network. Both Pu and Pm promoters are activated by a number of factors (three and five respectively) but while Pu needs all elements present at the same time to become activated, Pm needs only two TFs out of the set of five. This is denoted by the OR gates in the construction of the Pm node. In contrast, Pu is represented only by AND gates (Fig. 5). Second, binary circuits simplify the build-up of network complexity by just adding more logic gates representing new molecular interactions (global regulators, alternative sigma factors, transporters). Third, interactions not expected to influence the dynamics of the system under specific conditions can be maintained in the graph but given a constant default value. For example, cells in stationary phase contain saturating intracellular concentrations of IHF and σ38 (RpoS) and therefore they can be given a continual value ‘1’, while other variables can be changed at a time. Finally, after assembly of the logic circuit model, the resulting network can be unequivocally translated into a set of equations using one of the aforementioned methods in order to perform the corresponding simulation. Such simulations can easily reveal incoherencies in the structure of the circuit and thus raise new hypothesis about the system under study. These arequestions that, in the case of the TOL logicome, await future work.
Adoption of a limited number of judicious abstractions and reasonable formalisms allows description, simulation and prediction of the functioning of biological networks, whether metabolic, regulatory or both. A separate issue is the interplay between the circuits and the genetic and biochemical cell-wide chassis into which they are integrated in a hierarchical fashion (Nogales et al., 2008; Puchalka et al., 2008). Since complexity of any system increases non-linearly with the number of components, it is hardly possible to generate experimentally all relevant parameters that influence the behaviour of each node of even relatively simple control circuits that operate in bacteria. Fortunately, formal languages and methods are growingly available to handle biological systems with data barely beyond the list of known components, the sign of their connections and some scattered wet information on some of them. For genome-scale metabolic networks, constraint-based formalisms provide valuable scaffolds to account for interacting species and to ascertain the emergent properties arising from these interactions, to explore the metabolic space and capabilities of the organisms involved. For those scarce systems where kinetic and molecular information is abundant, both deterministic differential equations and stochastic master equations become the tools of choice for robust modelling. But for the rest of the systems where data are sparse, originated in various laboratories and lacking a coherent format, one has to rely on minimal formalisms that capture the gross raison d'être of extant network architectures while lacking the details. Logic networks of the type discussed above share many of the qualities of the electronic processors, e.g. they execute computation of signals into responses (Rodrigo and Jaramillo, 2007; Rodrigo et al., 2008; Marchisio and Stelling, 2009). Yet, in electronic circuits the signal carrier is only electrons while the material nature of the input in biological circuits is often different from that of the output. Engineering signal-propagation cascades in biological systems thus requires that any given upstream input to one of the nodes results in an output that can be understood by the next node of the signal progression chain. The chances of designing biological circuits with the same ease as electronic counterparts depends on the availability of (naturally occurring or artificial) connectable logic gates, a currently fertile field of research (Silva-Rocha and de Lorenzo, 2008; Hunziker et al., 2010; Zhan et al., 2010). As discussed elsewhere (de Lorenzo, 2008) such approaches offer a second opportunity for attempting the design of microorganisms for environmental release as vectors of biodegradative activities for bioremediation and sensing. The same is true for re-programming effectively and efficiently microorganisms for the production à la carte of bulk and high-added value chemicals. All of this could not be achieved easily before due to the lack of sufficient systems understanding of the corresponding microbial agents and of their underlying networks and interaction (Cases and de Lorenzo, 2005). The sheer technological advances in the past decade and the conceptual developments such as those herein pave the road to the eventual achievement of these long-sought goals.
The work in the authors' Laboratory is supported by generous research grants of the Spanish Ministry of Science and Innovation (CONSOLIDER), by Funds of the Autonomous Community of Madrid and by contracts of the Framework Program of the EU (MICROME, BACSINE). We thank Carolyn Lam and Miguel Godinho at the Helmholtz Centre for Infection Research, Braunschweig, and Hidde de Jong (INRIA, Grenoble) for helpful discussions and comments.