The logicome of environmental bacteria: merging catabolic and regulatory events with Boolean formalisms

Authors


E-mail vdlorenzo@cnb.csic.es; Tel. (+34) 91 585 45 36; Fax (+34) 91 585 45 06.

Summary

The regulatory and metabolic networks that rule biodegradation of pollutants by environmental bacteria are wired to the rest of the cellular physiology through both transcriptional factors and intermediary signal molecules. In this review, we examine some formalisms for describing catalytic/regulatory circuits of this sort and advocate the adoption of Boolean logic for combining transcriptional and enzymatic occurrences in the same biological system. As an example, we show how known regulatory and metabolic actions that bring about biodegradation of m-xylene by Pseudomonas putida mt-2 can be represented as clusters of binary operations and then reconstructed as a digital network. Despite the many simplifications, Boolean tools still capture the gross behaviour of the system even in the absence of kinetic constants determined experimentally. On this basis, we argue that still with a limited volume of data binary formalisms allow us to penetrate the raison d'être of extant regulatory and metabolic architectures.

Introduction

In their natural settings, changes in environmental conditions often compromise survival of any given microorganism unless it responds and adapts to shifting physicochemical and nutritional scenarios (McAdams et al., 2004). Soil bacteria constitute a remarkable example of environmental adaptation as they are able to colonize a large number of niches and to deal with many variable conditions (Cases et al., 2003). The reasons for this extraordinary flexibility reside not only in the catalogue of regulatory and structural genes encoded in their genomes but also, more decisively, in the way regulatory networks sense external conditions and adjust cell physiology to changing circumstances. Such a sensorial ability is reflected in the repertoire of transcriptional factors (TFs) available in the genomes of archetypical soil bacteria (e.g. Pseudomonads) for controlling expression of both metabolic and stress–response functions. In this respect, the genomic complement of generalist microorganisms that thrive in natural environments has a much larger share of regulatory genes encoded than counterparts that inhabit stable niches (e.g. endosymbionts; Cases et al., 2003; Dos Santos et al., 2004; Konstantinidis and Tiedje, 2004). Needless to remark that TFs do not act in isolation but are hierarchically connected (Shen-Orr et al., 2002), allowing the cell to integrate different stimuli and build proper responses for prevailing under new settings. The overall flow of signal propagation through regulatory networks is sketched in Fig. 1. Note that for the rest of the article we refer to such networks as biological devices composed of connected nodes in each of which given inputs are converted into distinct outputs, which then become the inputs of other downstream nodes (de Las Heras et al., 2010). In this respect (and for the sake of illustration of this concept) the material nature of such inputs/outputs is not important, provided that they can be computed at the corresponding nodes (Istrail et al., 2007). As discussed below, this simplification is key for merging transcriptional and metabolic events with the same formalisms.

Figure 1.

The flow of signals through typical regulatory networks. The figure represents one network controlling a metabolic pathway. (I) In the metabolic part of the system, compound c1 is converted to c3 by the action of several independent enzymes (A, B, C, D and E) through the intermediate species c2. Both c1 and c2 are signal molecules which trigger the expression of the cognate pathways. (II) At the transcriptional factor level, signals c1 and c2 are sensed by specific TFs (TF1 and TF4 respectively) which themselves can integrate inputs from global regulators (TF2, TF3 and TF5) in order to trigger the expression of the two operons sketched. (III) Finally, at the gene expression level, the binding/unbinding of the different TFs determines the production of enzymes which, in turn, may feedback the first level (metabolism), closing the loop of signal propagation.

Although the main function of regulatory networks is signal integration, the specific architectures of their constituents endow the corresponding systems with distinct dynamic properties that make a difference in the final response. The structure of network motifs (for example feedback and feed forward loops, multi-input modules andmany others; Silva-Rocha and de Lorenzo, 2010) enable them not only to process predetermined inputs into equally preset outputs. Also they determine important properties such as response time, shape of the response generated and pulsing or monotonic outcome, independently of specific parameters (Mangan and Alon, 2003; Mangan et al., 2003). Intricate integration phenomena encompassing extracellular compounds, intracellular metabolic sensors and signal propagation by small molecules often appear in the regulatory networks that control biodegradative pathways for recalcitrant compounds (Shingler, 2003). This makes sense, as bacteria that inhabit polluted sites have to make decisions between different nutrients-to-be on the basis of many endogenous and exogenous factors (i.e. compound availability, physiological state of the cells, the flux of carbon through the metabolism, final electron acceptors, physicochemical circumstances, etc.; Rojo, 2010).

The most characterized cell-wide regulatory network is that of the central metabolism of carbon in the model organism Escherichia coli (Kotte et al., 2010). In this case, the hierarchy of global and specific TFs that control expression of the genes encoding large metabolic blocks typically originates network topologies that are optimal for processing the signals mentioned before and bring about the most advantageous physiological result (Balazsi et al., 2005; Kotte et al., 2010). The bottom line is, in any case, that easy-to-degrade carbon sources are consumed first over compounds more difficult to metabolize (Bruckner and Titgemeyer, 2002). In contrast nutrient choices in environmental bacteria are not, for example, between a palatable glucose and a less edible glycerol, but between carbon and nitrogen compounds with unusual molecular structures that often act themselves as chemical stressors (Velazquez et al., 2005). In these cases, it cannot come as a surprise that the corresponding regulatory architectures become more intricate. The set of TFs that control biodegradative pathways has not only to recognize a xenobiotic or recalcitrant compound as a nutrient-to-be, but also to ensure that the trade-off between metabolic gain and stress endurance is not detrimental to the general cell physiology (Shingler, 2003; Dominguez-Cuevas et al., 2006).

Building models

A growing approach to address the functions and properties of regulatory networks involves the formulation of models that translate the molecular interactions known for a given system into a set of equations that afford a simulation of the entire lot of physical and functional interplays (Herrgard et al., 2004; Karlebach and Shamir, 2008). During the last few years, a number of methods have been implemented and validated for building and simulating such models (Karlebach and Shamir, 2008). The initial step for analysing cellular networks involves the elaboration of a relational diagram comprising all components of the system and as much data as possible on the interactions among them, whether physical or functional (Fig. 1). In typical metabolic networks, the nodes are the substrates and products on which the enzymes act on, while the edges represent enzymatic activities themselves. Merely descriptive models with nodes (components) and edges (connection between nodes) can then be enriched with stoichiometric coefficients, kinetic constants and thermodynamic information (e.g. reversibility). The list of enzymatic activities can be grossly derived from genomic annotations based on homology search (Notebaart et al., 2006; DeJongh et al., 2007; Thiele and Palsson, 2010). Such lists can then be refined by manual curation, for instance by looking for missing activities in the corresponding metabolic map (Reed et al., 2006). A large number of additional problems can appear at every step of genome-based metabolic reconstructions (Nogales et al., 2008; Puchalka et al., 2008; see Feist et al., 2009 for a compendium). Experimental kinetic and thermodynamic data have to be determined either by direct biochemical analyses or indirectly by means of parameterization algorithms that convert omics data intoapparent kinetic constants (Jaqaman and Danuser, 2006; Breitling et al., 2008). The volume and quality of such wet information is highly variable depending on the organism, and in many cases it is not available at all. Besides, this information often pertains to in vitro results, which reflect poorly true intracellular behaviour. How to make then reasonable metabolic models in view of the frequent dearth of data? To overcome this difficulty, genome-scale metabolic reconstruction and simulations can be made through constraint-based approaches (like flux-balance analysis), which do not need such information (Feist et al., 2009; Thiele and Palsson, 2010). Data from different experimental omics approaches, such as metabolomics, fluxomics, transcriptomics and proteomics, provide additional information with which to constrain the models and thus to reduce the solution space and increase accuracy (Herrgard et al., 2006; Feist et al., 2009).

In contrast to metabolic counterparts, regulatory networks capture information on the influence that particular genes (i.e. those encoding TFs) exert on the expression or activity of others. As a consequence, the nodes in this case are composed of genes (often assimilated to their encoded proteins), while the edges express regulatory connections, e.g. either activation or inhibition (Fig. 1). Regulatory diagrams of this sort facilitate the visualization of coordinated regulatory effects, which appear as sets of TFs acting on the same target (Schlitt and Brazma, 2007). Unlike metabolic scenarios discussed above, the reconstruction of the regulatory network of a given organism on the mere basis of its genomic sequence is a very challenging task (Goelzer et al., 2008). While metabolism (specially the core of central enzymatic reactions) is relatively conserved in many different organisms, similar regulatory outcomes can originate from unrelated TFs and regulatory modules (Price et al., 2007). Beyond a few model organisms, the volume of information on regulatory interactions is very scarce, making it very difficult to translate the knowledge obtained for one bacterium into others, even in the case of close species. Bona fide orthologues of typical E. coli's TFs such as the catabolite regulatory protein (CRP) or the integration host factor (IHF) have been found to govern entirely different sets of functions in Pseudomonas putida through an evolutionary exaptation process (Milanesio et al., 2011). The targets and functions of orthologous TFs are often not the same in different species (Hale et al., 2007), thus making it necessary to examine regulatory interactions experimentally on a case-by-case basis.

Translating networks into logic circuits

A wide variety of mathematical approaches have been developed for simulating genetic regulatory systems of the sort just discussed (Gillespie, 1977; van Kampen, 1992; Mendes, 1993; McAdams and Arkin, 1999; Friedman et al., 2000; Hoffmann et al., 2002; de Jong, 2002; Shlomi et al., 2007; Raj and van Oudenaarden, 2008; Polynikis et al., 2009; Morris et al., 2010). Yet, the lack of benchmarking for measuring transfer functions in various laboratories (de Las Heras et al., 2010) reduces the actual data that can be used to feed simulations. Typically, one group will employ lacZ as reporter to measure promoter activity, another will prefer GFP and still others may rely on DNA array technology or RT-PCR. The lack on anything comparable to a conversion table between measurement units in vivo very often leads to soft consensus descriptions of regulatory circuits that for the most part employ arrows (→) or hammers (⫞) for expressing either positive or negative interactions between system components. Assigning numbers to the arrows (Ronen et al., 2002; Ashyraliyev et al., 2009) is in fact one of the challenges of contemporary Biology.

A simple alternative to display and simulate genetic circuits when little or no information on transfer functions between nodes is accessible involves the adoption of Boolean concepts. Binary logic is in this case sufficiently instrumental for describing the states of the components of the biological system under scrutiny. This type of logic analysis is possible because signal propagation through a biological network (and the resulting output) is dependent on the way the molecular components are connected (Faure and Thieffry, 2009; Morris et al., 2010). This is the same principle of electronic logic devices, the calculation ability of which depends on the way the transistors are joined. In Boolean networks, the status of any specified gene is characterized by only two possible values (true or false: 1 or 0) that reflect whether that node of the circuit is active or not. Regulatory interactions can then be accurately entered in the network as logic gates that execute Boolean functions such as AND, OR, NOR, etc. (Silva-Rocha and de Lorenzo, 2008). This allows describing expression of any gene as a result of the presence or absence of other genes and small molecules that act as regulators (Buchler et al., 2003). Such a descriptive language enables the layout of dynamic and deterministic models in which known inputs are processed into just as discrete outputs (Hasty et al., 2002). In this respect, the sole architecture and hierarchy of network components endows the system with intrinsic signal computation capacities at the nodes of the circuit and fixes a signal propagation itinerary through the entire set-up. These features are ultimately shaped by connectivity and the sign of the interplay between the interacting components of the network. For instance, in a feed forward loop module (FFL; Shen-Orr et al., 2002), in which two TFs activate directly and indirectly the same target gene, the final shape of the response curve is very different depending on whether both regulators are equally efficient in the activation of the third gene or they have to cooperate for generating the output (Mangan and Alon, 2003). These alternative FFL scenarios can be easily translated into different classes of Boolean operators. Specifically, the first instance is equivalent to an OR gate acting on both TFs (i.e. the presence of just one TF is sufficient to bring about the final effect), while the second corresponds to an AND operator (both TFs are necessary to activate the target gene). In this way, logic operators (gates) describe rigorously the sign of the relationship between interacting components of the regulatory system and fix the outcome resulting from these interactions. Values of 0 or 1 can be assigned to both the inputs and outputs of the circuit, which becomes a signal computation device reminiscent of those made with transistors (Silva-Rocha and de Lorenzo, 2008). This is not yet a quantitative description of the system, but it allows penetrating its inner logic and move further than the typical arrows/hammers depiction of regulatory networks.

Integration of regulatory and metabolic logic in the same Boolean circuit

Two extreme abstractions have to be adopted for conversion of any regulatory circuit into a logic network. First, the components can only hold either of two state values (0 and 1). Second, the material nature of the same components is entirely disregarded as long as they do their job in computing set inputs into determined outputs. Although not always explicit, the near exclusive components of networks of this sort consist of TFs and inducing signals (whether exogenous molecules or physicochemical conditions). In real cells, however, regulatory devices operates on the background of an active metabolism. Such biochemical activity is not only controlled by dedicated genetic circuits, but also the enzymes and substrates/products can physically or functionally interact with TFs, creating regulatory interplays between the transcriptome and the metabolome (Kotte et al., 2010). Importantly, the organization of an enzymatic network can also be formalized as a whole of logic gates (i.e. biochemical computing) in which both the inputs and the outputs consist of enzymes and metabolites rather than TFs and inducers (Niazov et al., 2006; Pita et al., 2009; Katz and Privman, 2010). Since the activity states of the components of such enzymatic systems are equally abstracted to binary values 1 and 0, it is then perfectly feasible to merge regulatory and biochemical networks in the same logic circuit and examine its structure as a unique biological object (Johnson et al., 2004). An example is described in more detail below regarding the logic architecture of the entire regulatory/enzymatic network for environmental m-xylene biodegradation. Note that dual inputs that can be processed by a node/gate in a merged network of this sort may include (i) two TFs, (ii) one TF and one metabolite and (iii) one enzymatic activity and one substrate, for instance a nutrient or a metabolite. By the same token, outputs might consist of proteins (whether TFs or enzymes) and small molecules (reaction products, intermediate metabolites and signalling chemicals, e.g. autoinducers). This type of abstractions allow us to uncover emergent properties of biological networks which are not noticeable if the regulatory and the enzymatic connections of the same system are addresses separately, let alone if the properties of each of the components are examined out of their context.

From digital networks to workable models

While elaboration of a logic map of a natural regulatory network of the sort just described might be the first step in the global analysis of a whole system, the final objective of any modelling is the formulation of equations that represent all key interactions between components. Boolean approaches can be enriched with methods for incorporating stochastic effects (Shmulevich et al., 2002), but digital genetic/metabolic networks and their cognate simulations ultimately ignore kinetic parameters. As a result, all components of the simulated system update their state in a synchronous fashion, what is evidently far from biological reality. Fortunately, more general logical networks have been proposed that are at the same time asynchronous and multi-level (Faure and Thieffry, 2009). A simplified strategy to this end involves the use of piecewise-linear (PL) approximations (de Jong et al., 2003). The underlying concept in this case is that the switch-like behaviour of gene regulation affords to grossly match the non-linear function of ordinary differential equations (ODEs) to step functions that reflect downstream gene expression at a given concentration of the corresponding upstream regulator:

image(1)

This equation indicates that the synthesis of product i is a function of the presence of the effector j. In this equation, s+ is a step function, a Boolean operator that sets to one if the concentration of j (xj) is above a particular threshold (θj), and it is zero if not. The synthesis of i produces at a rate given by ki when xj ≥ θj, and does not take place otherwise. In this case, j is an activator of the synthesis of i. A repressor can be expressed by a negative step function:

image(2)

which sets to one when xj < θj, and it is zero when xj ≥ θj. The synthesis of a particular compound results from the combination of different regulation functions, each involving a different effector, for instance:

image(3)

where the production of i is regulated by the activator j and the repressor k. In this way, it is possible to model all possible transcriptional and metabolic interactions in the system, determining the production of particular compounds in function of the presence or absence of some others. It is possible to set different thresholds to the concentration of a particular compound, if it regulates different reactions at different concentrations. For instance, if effector j regulates two reactions, we can set the constraint θj1 < θj2, to indicate that reaction 1 is regulated by a lower concentration of j than reaction 2. These constraints are known as threshold inequalities. The levels/activities of each of the molecular species (genes, proteins, RNAs, etc.) that participate in the network are represented by continuous time variables. Given that they represent concentrations, the variables cannot take negative values. Therefore, the concentration of a particular compound is determined by its production rate (expressed by ki), and its degradation rate (gi), which is a strictly positive function. With the combination of positive and negative step functions, it is possible to model all possible regulatory interactions in the system, no matter their complexity. This approach allows the simulation to proceed without any knowledge of kinetic constants between network nodes, and affords description of the Boolean networks described above as sets of PL equations that are reminiscent of actual ODEs. Furthermore, recent improvements in modelling logic circuits using PL approximations (Baldazzi et al., 2010) allow entering differences in the timescale of the relevant molecular events based on judicious biological reasoning. For instance, the time necessary for a metabolite to bind or be released from a cognate TF has to be inevitably shorter than the time it takes to transcribe an entire gene (Alon, 2006; Mayo et al., 2006). The result of formalisms of this sort is that the itinerary of inputs and outputs through the network can be displayed as a coarse continuous flow rather than a series of discrete jumps between binary states. Instead, step functions can approximate sigmoids in various instances (de Jong et al., 2003; Baldazzi et al., 2010) thus providing a most useful simplification without loss of accuracy in the eventual simulation of the corresponding network. PL models thus keep the expressive power of logic networks, but at the same time they are well grounded in the classical modelling framework based on differential equations.

Boolean description of m-xylene biodegradation by P. putida mt-2: the TOL logicome

Bacteria that colonize sites polluted by recalcitrant and xenobiotic chemicals offer a repertoire of regulatory and catabolic devices (Tropel and Van Der Meer, 2004; Phale et al., 2007; Carmona et al., 2009). Comparative studies of the transcriptional networks that control expression of pathways for degradation of recalcitrant chemicals reveal an extraordinary – and largely inexplicable – diversity of regulatory architectures (Tropel and Van Der Meer, 2004). One of the most conspicuous cases appears in the so-called TOL network, which regulates a complex pathway for the degradation of toluene and m-xylene in the soil bacterium P. putida mt-2 (Ramos et al., 1997). The TOL pathway is encoded by a mobile, self-transmissible plasmid called pWW0, which encodes the enzymes necessary for conversion of m-xylene into pyruvate and acetaldehyde, i.e. transformation of otherwise recalcitrant substrates into central metabolites. As shown in Fig. 2, the TOL network includes two transcriptional regulators (XylR and XylS) that control expression of two cognate operons. These determine subsequent steps of the transformation of m-xylene (m-xyl) to 3-methylbenzoate (3MB, upper pathway) and from 3MB to intermediates of the tricarboxylic acid cycle (TCA, lower pathway). The TOL network is wired to the rest of the cell by a number of chromosomally encoded factors, including the histone-like proteins IHF (Holtel et al., 1990; Abril et al., 1991; de Lorenzo et al., 1991) and HU (Perez-Martin and de Lorenzo, 1995b; 1997) as well as four sigma factors (σ70, σ54, σ38 and σ32) and additional regulatory proteins TurA and PprA (Rescalli et al., 2004; Vitale et al., 2008). All these connect expression of the TOL genes to both internal signals (growth phase, energy charge, ppGpp) and external stimuli (alternative C sources, temperature, N compounds). The whole of enzymes and regulators encoded in the plasmid forms an autonomous molecular network of a relatively small dimension that is suitable for the type of Boolean formalisms mentioned above.

Figure 2.

Components and regulatory interactions of the TOL genetic network. The basic TOL circuit of plasmid pWW0 of Pseudomonas putida mt-2 shown is composed by the transcriptional regulators XylR and XylS, the host factors IHF and HU, the RNAP with the sigma factors σ70, σ54, σ38 and σ32. The outcome of the network is the production of the enzymes for complete biodegradation of m-xylene (m-xyl) into TCA intermediates. The enzymes encoded by the upper operon convert m-xyl into 3-methylbenzoate (3MB) in a process brought about by enzymes xylene monooxygenase (xylAM), benzaldehyde dehydrogenase (xylB) and benzyl alcohol dehydrogenase (xylC) plus perhaps others. 3MB is then metabolized by the lower pathway enzymes in six steps, which eventually produce the central metabolites pyruvate and acetaldehyde, which are further channelled into the TCA cycle. The head substrates of the upper and lower pathways are inducers of the cognate regulators XylR and XylS respectively. The organization of each transcriptional unit and the connections between them are described in the text. Abbreviations: XylRi, inactive XylR; XylRa, active XylR; XylSi, XylS inactive; XylSa, active XylS; XylSh, hyper-expressed XylS; m-xyl, m-xylene; 3MBA, 3-methylbenzyl alcohol; 3MBD, 3-methylbenzaldehyde; 3MB, 3-methylbenzoate; DMDC, 1,2-dihydroxy-3-methylcyclohexa-3,5-dienecarboxylate; 3MC, 3-methylcatechol; 2HOD, cis,cis-2-hydroxy-6-oxohept-2,4-dienoate; 2HD, cis-2-hydroxypenta-2,4-dienoate; 4HO, 4-hydroxy-2- oxovalerate.

Soft description of the TOL regulatory circuit

The TOL system is a biochemically separated entity from the rest of the host's metabolism that encompasses the whole of metabolic and regulatory genes required for complete degradation of m-xylene into intermediaries of the central pathways. Figure 2 summarizes virtually all known facts about the regulation of the system. In the absence of the substrate of the pathway, expression of both the upper and the lower pathways is entirely shut down due to the inactivity of their cognate promoters Pu and Pm respectively. The corresponding activators, XylR and XylS, are present by virtue of their expression through their divergent promoters Pr and Ps, but in an inactive form (XylRi, XylSi). Transcription of xylRi is maintained approximately constant through a typical negative feedback loop, while that of xylSi is kept constitutively low through a weak housekeeping promoter. The situation changes drastically as soon as cells are exposed to m-xylene. This inducer initiates a stepwise sequence of regulatory and metabolic events that start at the Ps–Pr region (where the maximum concentration of regulatory elements occurs) and is propagated through the entire circuit. The process starts with the binding of m-xylene to XylRi for production of an active form, XylRa. This causes two effects: (i) activation of the σ54Pu promoter and subsequent expression and activity of the upper pathway for conversion of m-xylene to 3MB, and (ii) activation of the Ps1 promoter and overexpression of xylS, which brings about a species of this factor (hyperproduced XylS, named XylSh), which is able to activate by itself the lower, meta-operon. As the process goes on, 3MB appears in the system as a product of m-xylene conversion by the upper pathway. This aromatic compound can now bind what remains available of inactive XylSi and switches this regulator into a form (XylSa) that – similarly to XylSh – is able to activate Pm (and thus the lower pathway) as well. Once both the upper and the lower genetic/biochemical pathways are in operation, the head substrate m-xylene is eventually converted into central metabolites (TCA). The result of the signal propagation cycle that starts with m-xylene as input is therefore the production of pyruvate and acetaldehyde as outputs of the whole process. The catabolic capacity of the system is fixed not only by substrate concentrations, but also by a large number of physiological control mechanisms (e.g. catabolic repression, growth phase control and others; Holtel et al., 1994; Cases et al., 1999; del Castillo and Ramos, 2007) that adjust the outputs of each of steps to the growth or stress conditions of the cells. These signals are entered through numerous host factors and endogenous signal molecules: IHF, HU, TurA, PprA, Crc, ppGpp, sigma factor competition, Entner–Doudoroff metabolites and perhaps several others (de Lorenzo et al., 1991; Perez-Martin and de Lorenzo, 1995b; Gallegos et al., 1996; Carmona et al., 2000; Rescalli et al., 2004; Aranda-Olmedo et al., 2005; Dominguez-Cuevas et al., 2005; Vitale et al., 2008).

The regulatory narrative just spelled out (Fig. 2) is largely based on quantitative measurements of lacZ (β-galactosidase) fusions to each of the promoters at stake. Alas, the lack of standardization of the procedures, let alone the absence of any formal parameterization of the transfer functions between one step of the process and the other (Endler et al., 2009), prevents any systems-level comprehension of the circuit as a whole. However, the information available on different regulatory and metabolic parts is sufficient to assign given on/off states to each of the nodes at various stages of signal propagation. In this context, the sections below account for the translation of each of the four regulatory knots that operate on the TOL circuit into a formal description of the corresponding biological functions using the tools of Boolean analyses (Fig. 3A).

Figure 3.

Formalization of the Pr–Ps regulatory node of the TOL network as a set of logic operations. A. Logic gates types along with their corresponding truth tables. B. Expression of xylR gene from tandem σ70 promoters Pr1 and Pr2 and activation of the XylR protein by m-xylene. A NOR gate represents the autorepression of Pr promoters brought about by either XylRi (not bound by m-xyl) or XylRa (active, bound to m-xyl). The formation of XylRi is limited by the action of such autorepression on Pr and is represented by one AND gate which combines with the result of the NOR gate as one of the inputs and one input and σ70 as the second input. In turn, XylRi and m-xyl form another AND gate that has XylRa as its output. C. Expression of xylS gene from tandem σ54 promoter Ps1 and σ70 promoter Ps2. The action of these promoters results in two types of XylS and are thus formalized as separate logic clusters. In one case, formation of active XylSh from Ps1 is represented as two connected AND gates which compute three inputs: σ54, HU and XylRa. In the other case, XylSi is produced from constitutive Ps2 through a YES gate with σ70. The output is then converted into its active form (XylSi) by means of an AND gate that has XylSh and 3MB as inputs.

De-construction of the Ps–Pr regulatory node into three autonomous logic units

As mentioned above, the Pr–Ps region is the one where the regulatory programme of the TOL system intensifies, as it encompasses the genes of the two regulators of the system (xylR, xylS) connected through two sets of overlapping divergent promoters. This region has been the only thus far amenable to dynamic modelling (Koutinas et al., 2010; 2011). As shown in Fig. 2, tandem σ70 promoters Pr1 and Pr2 (Inouye et al., 1985) express the master regulatory xylR gene, the product of which (XylRi) represses its own synthesis (Bertoni et al., 1997). For the sake of this abstraction, both Pr1 and Pr2 are considered a single σ70 promoter, Pr. The binding of m-xylene (input) to XylRi (input) for production of XylRa (output) can be formalized as an AND gate, the product of which can also repress xylR (Fig. 3B). This can be represented as separate negative feedback loops that shape a NOR gate, the output of which is the input for another AND gate with σ70 as the second input for Pr. On the other hand, i.e. expression of xylS, involves two transcription-promoting devices, one of them low-constitutive (Ps2) and the other dependent on σ54 and inducible by XylRa. The unsettled controversy on whether the low-constitutive expression is Ps is due to a bona fideσ70 promoter (Gallegos et al., 1996) or it is a residual activity of a XylRa-dependent σ54 promoter (Perez-Martin and de Lorenzo, 1995a) makes no difference for our Boolean analyses. Ps1 and Ps2 are the operative names given in any case to each of the two devices that promote xylS expression either in an inducible or in a constitutive manner respectively. Unlike the case of Pr, the two Ps promoters have to be formalized separately, because they originate distinct outputs. As shown in Fig. 3C, the only input of Ps2 is the housekeeping σ70, and its only output is the inactive XylSi, an occurrence that can be represented as a YES gate. In turn, XylSi and 3MB (the product of the upper pathway; Ramos et al., 1997) form an AND gate that originates active XylSa as its output. Moreover, XylRa activates Ps1 with the concourse of σ54 and HU. The fact that this action requires three necessary inputs (XylRa, σ54, HU) is not an obstacle for our Boolean analysis, because we can disclose a three-input AND gate as the sum of two connected binary gates. The output of this action is hyper-expressed XylSh, a form of the factor that is able to activate Pm in the absence of 3MB (see above; Mermod et al., 1987). Note that for this analysis, we consider XylSh and XylSi as separate TFs, equally competent for activating Pm. While this distinction may not be mechanistically accurate (Dominguez-Cuevas et al., 2005) it allows us to separate the activation of Pm resulting from the master regulator XylR from that brought about by the formation of 3MB after the action of the upper pathway enzymes on m-xylene (see below). It has been possible to merge both Boolean approaches with dynamic modelling for representing the Pr/Ps node, as the lack of information about the TFs makes it necessary to use (if nothing else) a 0/1 approach to describe the interaction between, e.g. σ70, σ54, and HU and the target promoters (Koutinas et al., 2010; 2011).

Formalization of regulatory events at the upper and lower TOL operons

As shown in Fig. 2, the upper TOL operon encodes three enzymatic activities necessary for the conversion of m-xylene into 3MB: xylene monooxygenase (xylAM), benzaldehyde dehydrogenase (xylB) and benzyl alcohol dehydrogenase (xylC). The upper pathway bears also other extra genes with uncertain roles (Greated et al., 2002). The operon is expressed from the σ54-dependent promoter Pu, which requires XylRa and IHF for transcription initiation (Calb et al., 1996; Bertoni et al., 1998). For the model, we consider the Pu promoter to act as a combination of two AND gates (Fig. 4B) in which XylRa, σ54 and IHF are the inputs and the entire catalytic complement of the upper pathway abstracted as a single output. The second metabolic operon (encoding the lower TOL pathway) encodes a set of enzymatic activities able to convert 3MB into 3-methyl catechol (xylXYZL), followed of meta-cleavage of the di-hydroxylated ring for formation of a semialdehyde (xylE) and eventual routing of this product into the TCA cycle (xylFJK). As was the case with the upper pathway, these core activities go together with other genes of unknown function (Greated et al., 2002). And as before also, they are considered model-wise as the single output of the Pm promoter (Fig. 3B). Transcription initiation in this case can be elicited by any of the active forms of XylS (XylSa or XylSh; Inouye et al., 1981; Mermod et al., 1987). Moreover, the core RNAP that activates Pm can employ any of the three sigma factors of the σ70 family, i.e. the housekeeping σ70 (RpoD), the stationary-phase σ38 (RpoS) and the heat shock σ32 (RpoH; Gallegos et al., 1996; Marques et al., 1999; Dominguez-Cuevas et al., 2005). This scenario for Pm regulation can be depicted as a gate with five possible inputs and one single output, the binary computation of which can be broken down to discrete OR and AND gates as shown in Fig. 4C.

Figure 4.

Logic organization of the metabolic reactions of the TOL plasmid. A. The action of the Pu promoter is represented as two combined AND gates with XylRa, IHF and σ70 as inputs, and having the upper pathway as their output. B. Expression of the lower pathway corresponds to four logic gates which compute the presence of five different inputs. At least one sigma factor (out of the three possible) is necessary for expression of the Pm promoter, along with either XylSh or XylSi. This is signified in the figure as one cascade of OR gates for the sigmas and another OR gate for the XylS forms, which eventually converge in an AND gate that has the lower TOL pathway as its output. C. Formalization of metabolic processes. 3MB is produced from m-xyl by means of the upper pathway enzymes, while TCA is produced from 3MB metabolism through the lower pathway. m-xyl and the upper route thus form the AND gate that originates 3MB. By the same token, TCA is the output of the AND gate that has 3MB and the lower pathway as inputs. Note the central position of 3MB as the signal carrier between the upper and the lower metabolic blocks.

3MB is the endogenous signal carrier through the domains of the TOL network

The primary input to the TOL system is m-xylene, which once computed by the logic gates of the Pr–Ps region (see above) generates two active TFs as distinct outputs, XylRa and XylSh. These can then act as inputs of the Pu and the Pm promoter respectively. Up to that point, all actions are regulatory. However, the outputs of both the upper and the lower operons are enzymatic activities, not regulators. We can also translate such activities as useful inputs in the network. The upper pathway and m-xylene are inputs of an AND gate which has 3MB as an output signal (Fig. 4D). In turn, 3MB can form AND gates with both XylSi (to become XylSa, see above) and the lower TOL pathway (to produce TCA cycle intermediates). Note that under this scheme, the endogenously produced 3MB becomes the key signal carrier molecule that can be read by both regulatory and metabolic nodes of the entire TOL network, hereby connecting the activities of the upper and lower domains of the system. Finally, the eventual output of the whole metabolic and regulatory system is the complete degradation of 3MB into pyruvate and acetaldehyde (Burlage et al., 1989; Assinder and Williams, 1990; Ramos et al., 1997), which are channelled into the central metabolism, e.g. the TCA.

The TOL logicome

The result of connecting all nodes of the TOL system in the shape of binary Boolean gates was a body of logic operations (a logicome) able to compute defined environmental signals into fixed biological responses by means of both regulatory and enzymatic actions (Istrail and Davidson, 2005; Istrail et al., 2007). The logicome (Fig. 5) includes one exogenous (m-xylene) and six endogenous inputs (IHF, HU, σ70, σ54, σ38 and σ32), one inborn signal carrier molecule (3MB) and one single outward output (TCA). For the sake of simplicity, we have not considered as inputs other nutritional signals (presence of alternative C sources) or environmental circumstances (temperature, growth phase) known to affect the performance of the system (Ruiz et al., 2004). We thus assume that that cells that bear such a logicome undergo favourable growth conditions for optimal expression of TOL activities.

Figure 5.

The logicome of the TOL network. The circuit shown represents the entire computation performed by the TOL network of Fig. 2. The computation itself is made by the set of logic operators inside the shadowed section, while the inputs and final output have been placed outside. The one exogenous input of the logicome is m-xyl, which triggers the activation of the entire regulatory and metabolic programme that eventually produces TCA as the ultimate output. Gates inside the grey boxes include the molecular operations performed by the six promoters of the network (Pr1, Pr2, Ps1, Ps2, Pu and Pm), while the others represent biochemical events (i.e. XylR and XylS binding to their respective effectors and metabolic conversions). The simplest computation is performed by Ps2, in which an amplifier gate takes a single input (σ70) and converts it in a single output (XylSi). The most complex signal processing is performed by Pm.

The resulting logic representation of the TOL network encompassing all known molecular interactions provides a low-resolution but still quantitative view of the system. Circuit-like formalisms of this sort have a number of intrinsic advantages over the customary arrows/hammers often used to sketch biological networks. First, logic circuits underscore differences in the signal propagation itineraries between components of the network. Both Pu and Pm promoters are activated by a number of factors (three and five respectively) but while Pu needs all elements present at the same time to become activated, Pm needs only two TFs out of the set of five. This is denoted by the OR gates in the construction of the Pm node. In contrast, Pu is represented only by AND gates (Fig. 5). Second, binary circuits simplify the build-up of network complexity by just adding more logic gates representing new molecular interactions (global regulators, alternative sigma factors, transporters). Third, interactions not expected to influence the dynamics of the system under specific conditions can be maintained in the graph but given a constant default value. For example, cells in stationary phase contain saturating intracellular concentrations of IHF and σ38 (RpoS) and therefore they can be given a continual value ‘1’, while other variables can be changed at a time. Finally, after assembly of the logic circuit model, the resulting network can be unequivocally translated into a set of equations using one of the aforementioned methods in order to perform the corresponding simulation. Such simulations can easily reveal incoherencies in the structure of the circuit and thus raise new hypothesis about the system under study. These arequestions that, in the case of the TOL logicome, await future work.

Conclusion

Adoption of a limited number of judicious abstractions and reasonable formalisms allows description, simulation and prediction of the functioning of biological networks, whether metabolic, regulatory or both. A separate issue is the interplay between the circuits and the genetic and biochemical cell-wide chassis into which they are integrated in a hierarchical fashion (Nogales et al., 2008; Puchalka et al., 2008). Since complexity of any system increases non-linearly with the number of components, it is hardly possible to generate experimentally all relevant parameters that influence the behaviour of each node of even relatively simple control circuits that operate in bacteria. Fortunately, formal languages and methods are growingly available to handle biological systems with data barely beyond the list of known components, the sign of their connections and some scattered wet information on some of them. For genome-scale metabolic networks, constraint-based formalisms provide valuable scaffolds to account for interacting species and to ascertain the emergent properties arising from these interactions, to explore the metabolic space and capabilities of the organisms involved. For those scarce systems where kinetic and molecular information is abundant, both deterministic differential equations and stochastic master equations become the tools of choice for robust modelling. But for the rest of the systems where data are sparse, originated in various laboratories and lacking a coherent format, one has to rely on minimal formalisms that capture the gross raison d'être of extant network architectures while lacking the details. Logic networks of the type discussed above share many of the qualities of the electronic processors, e.g. they execute computation of signals into responses (Rodrigo and Jaramillo, 2007; Rodrigo et al., 2008; Marchisio and Stelling, 2009). Yet, in electronic circuits the signal carrier is only electrons while the material nature of the input in biological circuits is often different from that of the output. Engineering signal-propagation cascades in biological systems thus requires that any given upstream input to one of the nodes results in an output that can be understood by the next node of the signal progression chain. The chances of designing biological circuits with the same ease as electronic counterparts depends on the availability of (naturally occurring or artificial) connectable logic gates, a currently fertile field of research (Silva-Rocha and de Lorenzo, 2008; Hunziker et al., 2010; Zhan et al., 2010). As discussed elsewhere (de Lorenzo, 2008) such approaches offer a second opportunity for attempting the design of microorganisms for environmental release as vectors of biodegradative activities for bioremediation and sensing. The same is true for re-programming effectively and efficiently microorganisms for the production à la carte of bulk and high-added value chemicals. All of this could not be achieved easily before due to the lack of sufficient systems understanding of the corresponding microbial agents and of their underlying networks and interaction (Cases and de Lorenzo, 2005). The sheer technological advances in the past decade and the conceptual developments such as those herein pave the road to the eventual achievement of these long-sought goals.

Acknowledgements

The work in the authors' Laboratory is supported by generous research grants of the Spanish Ministry of Science and Innovation (CONSOLIDER), by Funds of the Autonomous Community of Madrid and by contracts of the Framework Program of the EU (MICROME, BACSINE). We thank Carolyn Lam and Miguel Godinho at the Helmholtz Centre for Infection Research, Braunschweig, and Hidde de Jong (INRIA, Grenoble) for helpful discussions and comments.

Ancillary

Advertisement