GeBiX Colombian Center for Genomics and Bioinformatics of Extreme Environment and Research Group Microbial Ecology: Metabolism, Genomics and Evolution of Communities of Environmental Microorganisms, CorpoGen, Carrera 5 # 66A-35, Bogotá, Colombia.
Microbial degradation is the main mechanism responsible for the recovery of contaminated sites, where a huge body of investigations is available in which most concentrate on single isolates from soils capable of mineralizing pollutants. The rapid development of molecular techniques in recent years allows immense insights into the processes in situ, including identification of organisms active in target sites, community member interactions and catabolic gene structures. Only a detailed understanding of the functioning and interactions within microbial communities will allow their rational manipulation for the purpose of optimizing bioremediation efforts. We will present the status of the current capabilities to assess and predict catabolic potential of environmental sites by applying gene fingerprinting, catabolome arrays, metagenomics and complementary ‘omics’ technologies. Collectively, this will allow tracking regulation and evolution within microbial communities ultimately aiming to understand the mechanisms taking place in large scale bioremediation treatments for aromatic decontamination.
Given the widespread contamination with aromatic and aliphatic pollutants, it is a long-held desire to treat organic and inorganic waste more efficiently and remediate polluted environments via controllable and amenable microbial activities. However, despite their promising performance in the laboratory, the application of pollutant-degrading bacteria in microcosms or near-field situations have mostly ended in disappointment (El Fantroussi and Agathos, 2005; Thompson et al., 2005). Therefore, more optimal and rational use of the extremely high potential of catalytic activities in the environment has been proposed for more successful pollution treatment (Watanabe et al., 2002). Presently, this potential cannot be sufficiently exploited because of the lack of knowledge on the desired catabolic activity and ecological behaviour of the microbial community (Paerl and Steppe, 2003). Pollutant degradation in contaminated environments is in many cases carried out by microbial food webs rather than single species (de Lorenzo, 2008), where key species and catabolic genes are often not identical to those that have been isolated and described in the laboratory (Jeon et al., 2003; Witzig et al., 2006). We now know that microbial diversity in these environments is in orders of magnitude higher than assumed from previous cultivation efforts (Leigh et al., 2007). A particularly large number of novel techniques have been developed, which now allow the determination of microbial diversity and activity in situ at the polluted site, straightforward screenings for particular gene diversity, gene quantification, whole-genome sequencing of bacterial isolates and of DNA and mRNA from total communities. More knowledge on the potential of indigenous microbial metabolism of pollutants, on the processes involved and on the diversity and ecology of the organisms would permit us to more precisely understand the long-term fate of pollutants and to better direct our efforts to sustainable decontamination/detoxification of polluted environments.
Basic knowledge on key reactions for aerobic degradation of aromatics and alkanes
Already a century ago, bacterial isolates had been reported to have the ability to use aliphatic and aromatic hydrocarbons as sole carbon and energy sources (Söhngen, 1913). Since then, numerous aerobic (and also anaerobic) bacterial isolates have been studied in order to understand the mechanisms, which allow them to degrade specific members of the highly diverse range of aromatic compounds. Degradation by such isolates has been investigated thoroughly and is typically initiated by members of one of three superfamilies (see Fig. 1): the Rieske non-haem iron oxygenases usually catalysing the incorporation of two oxygen atoms (although some members of this superfamily also catalyse monooxygenations) (Gibson and Parales, 2000), the flavoprotein monooxygenases (van Berkel et al., 2006) and the soluble diiron multicomponent monooxygenases (Leahy et al., 2003). Further metabolism is achieved through di- or trihydroxylated aromatic intermediates. Alternatively, activation is mediated by CoA ligases where the formed CoA derivatives are subjected to selective hydroxylations (Fig. 1).
The further aerobic degradation of di- or trihydroxylated intermediates can be catalysed by either intradiol or extradiol dioxygenases (Fig. 1). While all intradiol dioxygenases described thus far belong to the same superfamily, members of at least three different families are reported to be involved in the extradiol ring cleavage of hydroxylated aromatics. Type I extradiol dioxygenases (e.g. catechol 2,3-dioxygenases) belong to the vicinal oxygen chelate superfamily enzymes (Gerlt and Babbitt, 2001), the type II or LigB superfamily of extradiol dioxygenases comprise among other protocatechuate 4,5-dioxygenases (Sugimoto et al., 1999) and the type III enzymes such as gentisate dioxygenases comprises enzymes belonging to the cupin superfamily (Dunwell et al., 2000). However, members of novel superfamilies that carry out crucial steps in aromatic metabolic pathways are still being discovered. As an example, the ring cleavage of benzoquinol formed during 4-hydroxyacetophenone degradation by Pseudomonas fluorescens ACB is catalysed by an enzyme of a novel class of Fe2+-dependent dioxygenases (Moonen et al., 2008; Perez-Pantoja et al., 2009), and the degradation of 4-chlorosalicylate involves a novel type of dienelactone hydrolase, which is the first member of a family of putative metal-dependent hydrolases, where an actual physiological function has been described (Camara et al., 2008). Many of the previously mentioned enzymes active in aromatic degradation pathways are, from their protein phylogeny, not strictly linked to the taxonomical affiliation of the bacteria hosting them (Perez-Pantoja et al., 2009), indicating that the genes encoding those catabolic enzymes are involved in very dynamic events of genetic material exchange. Therefore, the microbial community structure of a particular sample, expressed as the taxonomical composition, indicates the fitness of some bacterial phylogenetic groups in the sample analysed, but the catabolic gene potential has to be analysed independently in the same sample, as any presumption based only on taxonomy of the genomes present will result in circumstantial and unresolved associations.
Aerobic aromatic degradation
Having gained knowledge on metabolic properties of isolates, it is a logical step forward to define those genes, which may serve as markers to assess biodegradation potential at a given site. Typically, in respect to aromatic metabolism, these studies use primers designed based on conserved gene regions and focus on Rieske non-haem iron oxygenases (Taylor et al., 2002; Witzig et al., 2006) or soluble diiron monooxygenases (Hendrickx et al., 2006) as targets for activities initiating degradation or on extradiol dioxygenases cleaving the aromatic ring (Chandler and Brockman, 1996; Junca and Pieper, 2004). These studies range from those searching for a narrow window of genes similar or identical to those observed in type strains using non-degenerated primers (Salminen et al., 2008) or on subfamilies of homologous genes using degenerated primers (Witzig et al., 2006). However, due to the immense diversity (Perez-Pantoja et al., 2009), there will never be a pair of primers that will reliably cover the huge diversity of a catabolic gene family. Thus, the decision for which gene family to target is the foremost prerequisite for obtaining meaningful results.
As an example, studies on extradiol dioxygenases typically concentrate on enzymes of the vicinal chelate superfamily (Fig. 2), which can be roughly divided into that act on monocyclic aromatics (subfamily I.2) and those that act on bicyclic aromatics (subfamily I.3) (Eltis and Bolin, 1996). Early studies using molecular surveys of contaminated sites concentrated largely on the xylE gene as a molecular marker (Sotsky et al., 1994). However, the mere detection of such an extradiol dioxygenase of the subfamily I.2.A is not indicative of the potential to degrade pollutants such as naphthalene. But, can it reveal something regarding toluene degradation? The archetype strain for toluene degradation is Pseudomonas putida mt2 harbouring the TOL plasmid (Greated et al., 2002), which disassemble toluene via monooxygenation of the side-chain through a monooxygenase, which shares similarity with alkane monooxygenases (Suzuki et al., 1991). The central metabolites benzoate (formed from toluene) or 3-methylbenzoate (formed from m-xylene) are further disassembled through the respective catechols, where ring cleavage typically involves the subfamily I.2.A xylE-encoded catechol 2,3-dioxygenase. However, toluene can also be disassembled via successive monooxygenations catalysed by soluble diiron monooxygenases with methylphenols and methylcatechols as intermediates (Leahy et al., 2003) or through the action of a Rieske non-haem iron oxygenase of the toluene/isopropylbenzene/biphenyl subfamily followed by dehydrogenation with toluene dihydrodiol and 3-methylcatechol as intermediates (Gibson and Parales, 2000). Genes encoding these previously mentioned enzymes of archetype strains are typically clustered with genes encoding a broad substrate specificity extradiol dioxygenase of subfamily I.3.B (Beil et al., 1999) or an extradiol dioxygenase of subfamily I.3.A (Eltis et al., 1992) (Fig. 2). Thus, it may seem that analysing the abundance and diversity of respective genes is appropriate for characterizing the potential of a given soil to degrade toluene and related compounds such as benzene via a dioxygenolytic route. However, the respective gene clusters typically comprise only one of the two known ‘branches of the meta-cleavage pathway’ for further disassembling of the ring-cleavage product. The so-called hydrolytic branch encoded by the respective clusters is necessary for the degradation of substituted catechols such as 3-methylcatechol or 2,3-dihydroxybiphenyl where the ring-cleavage product is a ketone, which is hydrolysed to 2-hydroxopenta-2,4-dienoate and acetate, in the case of 3-methylcatechol degradation. Benzene degradation, in contrast, necessitates the oxalocrotonate branch (Harayama et al., 1987) whereby intermediate 2-hydroxymuconic semialdehyde (generated from catechol) is subject to oxidation by 2-hydroxymuconic semialdehyde dehydrogenase. In accordance, it has previously been reported that benzene degrading isolates from a contaminated site recruit a pathway comprising a subfamily I.2.A extradiol dioxygenase which is typically clustered with such a branch and that subfamily I.2.A extradiol dioxygenases are predominant at the respective site (Witzig et al., 2006). Surveys that characterize the catabolic potential for biodegradation thus have to take into consideration the broad diversity of catabolic routes evolved by microorganisms.
However, this not only holds for the diversity of pathways that can be recruited, but also for the diversity of enzymes of a given gene family or even between gene families. Even though most biphenyl degrading Actinobacteria and Proteobacteria employ an enzyme of the subfamily I.3.A or I.3.B, the ring cleavage of 2,3-dihydroxybiphenyl may be catalysed by quite distinct enzymes belonging to different branches of the vicinal chelate superfamily (Taguchi et al., 2004), which may even be crucial for degradation (Hatta et al., 2003). Also, the only distantly related so-called one-domain extradiol dioxygenases such as BphC2 and BphC3 from Rhodococcus globerulus P6 have reported activity against 2,3-dihydroxybiphenyl (Asturias and Timmis, 1993) [subfamily I.1 as defined by Eltis and Bolin (Eltis and Bolin, 1996)] and may support the metabolism of chlorinated biphenyl congeners (McKay et al., 2003; Fortin et al., 2005) (Fig. 2). Even beyond the well-documented vicinal chelate superfamily, 2,3-dihydroxybiphenyl dioxygenases have been documented. As an example, BphC6 of Rhodococcus jostii RHA1 (ABO34703) or BphC3 of Rhodococcus rhodochrous K37 (Taguchi et al., 2004) belong to the so-called LigB family (Sugimoto et al., 1999), members of which are well recognized as being responsible for the degradation of protocatechuate via the protocatechuate 4,5-dioxygenase pathway or of cleaving 2,3-dihydroxyphenylpropionate (Spence et al., 1996) or 2-aminophenol (Takenaka et al., 2000). Additional LigB type enzymes have been described to be involved in the degradation of bi- and polycyclic aromatics (Laurie and LloydJones, 1999; Gibbs et al., 2003); however, respective genes are not typically targeted in environmental surveys. In contrast, catechol 1,2-dioxygenases have been proposed as markers for aromatic degradative potential (Cavalca et al., 2004). Although this seems logical to some extent, it must be considered that genome sequencing projects are revealing that respective genes belong to the core genome of Burkholderia as well as a large subset of Pseudomonas species (Perez-Pantoja et al., 2009) and may indicate the fitness of the respective hosts rather than selection of respective catabolic genes.
Aerobic alkane degradation
The degradation of alkanes has been for a long time associated with the presence of an AlkB integral-membrane non-haem diiron monooxygenase as is the case for P. putida GPo1 (van Beilen et al., 1994). Since then, alkane monooxygenases have been observed in various Proteobacteria and in Actinomycetales (van Beilen and Funhoff, 2007) and the growing collection of alkane hydroxylase gene sequences has allowed the analysis of their diversity and abundance in different environmental systems (Hamamura et al., 2008; Wasmund et al., 2009). The quantity of alkB genes has been found to be correlated with n-alkane concentrations in petroleum contaminated soils (Powell et al., 2006). However, recent reports show that the terminal oxidation of alkanes can also be catalysed by completely distinct enzyme systems. In 2001, the first bacterial cytochrome P450-dependent alkane monooxygenase was described from Acinetobacter sp. EB104 and termed Cyp153A1 (Maier et al., 2001). In the meantime, genes encoding cytochrome P450 CYP153 family proteins have been detected in a broad set of bacterial genera such as Mycobacterium (Funhoff et al., 2006) and Alcanivorax previously described to harbour AlkB encoding genes (van Beilen et al., 2004) as well as in genera not previously reported to be oil degraders such as Idiomarina or Erythrobacter (Wang et al., 2010) and are specifically common in alkane-degrading eubacteria lacking AlkB encoding genes (van Beilen et al., 2006). While their environmental importance has yet to be assessed in detail, some CYP153-encoding gene fragments have already been isolated from different environments and chimeric genes encoding functional proteins could successfully be created (Kubota et al., 2005).
Until recently, very limited information was available on the degradation of long-chain alkanes. In Acinetobacter sp. DSM 17874, able to grown on alkanes with chain lengths of up to 40 C atoms, a flavin-binding monooxygenase encoded by almA was identified as being involved in the metabolism of long-chain alkanes (Throne-Holst et al., 2007). Even though homologues were identified in various Acinetobacter strains, including Acinetobacter sp. M-1, where such activity was observed for the first time (Maeng et al., 1996), nothing is known about the environmental distribution of this gene type in contaminated sites. The same holds for LadA proteins. LadA is a flavoprotein monooxygenase that initiates the degradation of C15–C36 alkanes in Geobacillus thermodenitrificans NG80-2 (Feng et al., 2007) and has recently been shown to be a member of the Ssu subfamily of the bacterial luciferase family (Li et al., 2008). Clearly, attempts to characterize the catabolic diversity and functions involved in alkane degradation at contaminated environments have to take into consideration the high diversity of enzymes capable of initiating such metabolism.
The high diversity of enzymes and catabolic routes crucial for bacterial metabolism of pollutants is not the only challenge we face with when performing molecular diagnostics of polluted environments. It is well documented that single amino acid differences may have drastic influences on enzyme properties. As an example, the Rieske non-haem iron oxygenases are a large superfamily and have been further classified into subfamilies where typically, members of a subfamily share similarities in substrate specificity (Gibson and Parales, 2000). However, single amino acid differences may influence the regioselectivity and enantioselectivity of hydroxylation, as exemplified by naphthalene dioxygenase mediated attack on biphenyl or phenanthrene (Parales et al., 2000). Depending on the mode of hydroxylation, the substrate may be channelled into a productive route resulting in mineralization or the substrate may be co-metabolized resulting in the formation of dead-end products or intermediates that may further be catabolized by other community members present at the contaminated site. Such misrouting is most evident when comparing metabolic routes for, e.g. biphenyl and aromatic biarylethers such as dibenzofuran. While biphenyl is typically mineralized after 1,2-dioxygenation (so-called lateral dioxygenation) (Pieper and Seeger, 2008), dibenzofuran which may be regarded as a doubly ortho-substituted biphenyl requires attack at the ‘quasi ortho’ carbon (the angular position) and its neighbour (Fig. 3) to cleave the ether-bond (Armengaud et al., 1998), and lateral dioxygenation results in the formation of dead-end products. Single crucial amino acid differences were also reported to significantly change the substrate range and substitution of a methionine by alanine in toluene dioxygenase enabled the enzyme to transform tetrachlorobenzene, probably by facilitating access of the voluminous substrate tetrachlorobenzene to the active-site iron (Beil et al., 1998). Thus, to obtain an overview of the catabolic potential of contaminated sites, it is important not only to analyse the relative quantities of catabolic gene groups but also their diversity. As an example, a survey of a benzene contaminated site targeting the toluene/isopropylbenzene/biphenyl subfamily of Rieske non-haem iron oxygenases revealed the predominance of gene fragments, which are similar to those encoding isopropylbenzene dioxygenases. However, modelling of the active site and analysis of isolates harbouring respective genes revealed one of the predominant genes to harbour voluminous methionine residues at the active site, which has been proposed to prevent access of toluene (and isopropylbenzene) to the active site and thus the failure of respective isolates to grow on toluene (and also on isopropylbenzene) (Witzig et al., 2006).
Tools to analyse catabolic gene diversity
The detection of functional genes is usually performed through the analysis of clone libraries of gene fragments amplified using primers targeting a given gene family or through different DNA fingerprinting methods such as terminal restriction fragment length polymorphism (T-RFLP) (Sipila et al., 2008), denaturing or temperature gradient gel electrophoresis (DGGE/TGGE) (Gomes et al., 2007) or single strand conformation polymorphism (SSCP) (Junca and Pieper, 2004), with the last two mentioned methods giving direct access to sequence information. To achieve the simultaneous detection of multiple genes, microarrays consisting of probes of PCR fragments derived from reference genes or oligonucleotides and designed to anneal to sequences representing different catabolic gene families have been developed in the last decade. The advantage of such array systems is the amount of different sequences that can be detected in a single assay, contrasting PCR primer-based detections, where usually only a subset of a catabolic gene family can be targeted with a single primer set. However, arrays require time for careful design, are relatively costly and require detailed processing of information. The obtained results also require validation to confirm the correctness of signals.
An oligoarray to detect hundreds of functions related to bacterial degradation of pollutants, including catabolic, regulatory, resistance and stress genes, has been reported (Rhee et al., 2004) and evolved as the so-called GeoChip (He et al., 2007). There are some additional interesting approaches in the field of microarrays to detect catabolic functions related to aerobic aromatic biodegradation, such as the oligoarrays that specifically target Rieske non-haem iron oxygenases or monooxygenases (Iwai et al., 2008). However, at the present state, optimizing functional gene arrays is still necessary, as appropriate standards for data comparison and normalization are lacking and comparisons between microarray data across different sites, experiments and time periods is difficult (Liang et al., 2010).
New high-throughput sequencing technologies such as the 454 GS FLX (Roche), or the Genome Analyser (Illumina) will also change approaches for assessing catabolic gene diversity as in theory, a high number of PCR amplification products can be directly subject to sequencing. Even though such approaches are so far typically used to analyse community structure by sequencing of 16S rDNA amplicons (Liu et al., 2007; Roesch et al., 2007; Lazarevic et al., 2009), amplicon pyrosequencing has already been employed to target the diversity of biphenyl dioxygenases of the Rieske non-haem iron oxygenase superfamily (Iwai et al., 2010).
Function-based screening for novel activities
As stated above, new metabolic and enzymatic mechanisms involved in pollutant degradation are still being discovered. Even primer-based approaches, designed based on known metabolic diversity and on described mechanisms, are uncovering a broader diversity of enzymes than previously thought. The real microbial catabolic diversity of the environment is still awaiting to be deciphered.
Recent progress has revealed that the capture of genetic resources of complex microbial communities in metagenome libraries allows the discovery of a richness of new genetic diversity that had not previously been imagined (Ferrer et al., 2005; Beloqui et al., 2006). However, only a few reports clearly attempted to identify catabolic genes directly from environmental DNA by a metagenomic approach. Using the yellow coloration of catechol ring-cleavage products as functional screen, Brennerova and colleagues (2009) targeted a BTEX-contaminated environment and could identify one catechol extradiol dioxygenase activity to be encoded per 3.6 Mb of DNA screened from a fosmid library constructed in Escherichia coli, indicating a massively high abundance of these genes at the site. Interestingly, only one-fourth of the observed extradiol dioxygenases belonged to subfamily I.3.A or I.3.B (see Fig. 2) that would be expected as predominant taking into consideration the knowledge gained from isolates. Genes of subfamily I.2.A were absent, but a high abundance of genes with similarity to DbtC of Burkholderia sp. DBT1 (Di Gregorio et al., 2004) was observed. Based on specificity constants of enzymes expressed from the fosmids, a task-sharing between different extradiol dioxygenases in the community of the contaminated site can be supposed, attaining a complementary and community-balanced catalytic power against diverse catecholic derivatives, as necessary for effective degradation of mixtures of aromatics.
Also Suenaga and colleagues (2007) used a function-driven metagenomic approach to screen environmental DNA prepared from an active sludge used to treat coke plant wastewater. Even though extradiol dioxygenases typically observed in Proteobacteria such as enzymes of subfamily I.2.A were observed, the library was dominated by clones harbouring extradiol dioxygenases with homology to the manganese dependent 2,3-dihydroxybiphenyl dioxygenase of Bacillus sp. JF8 (Hatta et al., 2003) (Fig. 2), which, however, preferred catechol over 2,3-dihydroxybiphenyl as a substrate (Suenaga et al., 2009a). In addition, the library contained clones with extradiol dioxygenases having homology to BphC 2,3-dihydroxybiphenyl dioxygenase of Terrabacter sp. DPO360 (Schmid et al., 1997) (Fig. 2) indicating Firmicutes and Actinobacteria to be important for biodegradation by the sludge. Sequencing revealed that only a subset of clones contained complete degradation pathways whereas the majority of clones contained a subset of pathway genes in novel gene rearrangements (Suenaga et al., 2009b). However, the fact that aromatic compounds in the environment may be degraded through the concerted action of various fragmented pathways has also been supported by the study on isolates. As an example P. putida GJ31 degrades chlorobenzene by activities recruited from four different pathway modules (Kunze et al., 2009). Importantly, even though genes encoding a multicomponent phenol hydroxylase (Shingler et al., 1989), typically used as a target to characterize the potential and diversity for phenol degradation (Watanabe et al., 1998), have been observed on one fosmid, a gene encoding a single component phenol hydroxylase with similarity to that identified from Geobacillus stearothermophilus BR219 (Kim and Oriel, 1995) was observed in high abundance, indicating bacilli to be important for phenol degradation in the sludge.
It is known that several types of oxygenases when expressed in E. coli are able to produce the blue pigment indigo via the oxidation of indole, which is formed from tryptophan by E. coli tryptophanase. Indigo formation was then used to functionally screen a metagenomic library resulting in the discovery of a styrene oxidase only distantly related to those that have been previously characterized (van Hellemond et al., 2007). However, there are many biotransformation processes of interest that do not produce metabolites that can be easily detected by simple activity tests (such as reaction colour). Moreover, metagenomic library clones are usually in numbers that are not suitable for single chemical analyses. Thus, alternative high-throughput methods to screen such kind of libraries are needed. One approach to overcome these limitations uses a transcriptional regulator that is blind to the reaction substrate but responds to the reaction product, and as a result activates a promoter fused to a reporter gene (Galvao and de Lorenzo, 2006). Respective regulators may be searched for in natural regulatory circuits, but can also be engineered in order to recognize the product of the desired activity. Bacteria containing such a regulator/promoter/reporter system may then be used as receptors of a metagenomic library and only the clones hosting a metagenomic insert encoding an enzyme capable of catalysing the desired reaction should activate the reporter gene. Respective ‘genetic traps’ have recently been established for translating the transformation of gamma-hexachlorocyclohexane (HCH) into detectable signals by using a regulator responsive to 1,2,4-trichlorobenzene, a major product of HCH dehydrochlorination (Mohn et al., 2006). Another approach is based on the knowledge that catabolic gene expression is typically induced by relevant substrates and, in many cases, controlled by regulatory elements situated in proximity to catabolic genes. Random cloning of environmental DNA in front of a promoterless green fluorescent protein (GFP) reporter followed by fluorescence-activated cell sorting enrichment of the expression pool in the presence of the target substrates benzoate and naphthalene was then used to select for clones that bear catalytic activities related to the substrate (Uchiyama et al., 2005). In fact, benzoate catabolic genes could be observed by this approach. However, it was also discussed that this approach is not without problems as transcriptional regulators might be activated by effectors that are not substrates of the pathways they regulate and may, thus, endow the system with considerable noise of false positives (Galvao and de Lorenzo, 2006).
Mining bacterial genomes
Those molecular techniques described above enable us to directly extract and express novel information directly from contaminated sites irrespective of whether the hosts are cultivable or not. However, not only metagenomic but also genomic analyses of single strains constitute an immense source for discovering and exploiting novel biocatalysts. In general, genomic information of sequenced microorganisms can be used in at least two levels, on the one hand to elucidate genes where the function of encoded enzymes is unknown and on the other to better understand the metabolic network of strains endowed with a broad catabolic diversity. At present, 1247 bacterial genome sequences are listed at http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi and 907 finished and 838 draft sequences at http://img.jgi.doe.gov/cgi-bin/pub/main.cgi, comprising the complete genomes of biodegrading bacteria such as Burkholderia xenovorans LB400 (Chain et al., 2006), R. jostii RHA1 (McLeod et al., 2006), Cupriavidus necator JMP 134 (Perez-Pantoja et al., 2008; Lykidis et al., 2010), P. putida KT2440 (Nelson et al., 2002) or Mycobacterium vanbaalenii PYR-1 (Kim et al., 2008) and the genomic backgrounds for their abilities to utilize certain pollutants have been revealed. A detailed metabolic reconstruction has been performed using C. necator JMP134 to develop a detailed overview of its metabolism from an analysis of the genome sequence (Perez-Pantoja et al., 2008) and to link the catabolic abilities predicted in silico with the range of compounds that support growth of this bacterium. Of the 140 aromatic compounds tested, 60 serve as a sole carbon and energy source for this strain, strongly correlating with those catabolic abilities predicted from genomic data. However, the more interesting cases are where in silico predictions and experimental results do not fit.
At the gene level, information is available on the degradation of 4-hydroxyphenylacetate through monooxygenation by a two-component 4-hydroxyphenylacetate hydroxylase and homoprotocatechuate as central intermediate (Prieto and Garcia, 1994). However, JMP134 does not harbour genes encoding respective activities (Perez-Pantoja et al., 2008). In contrast, 4-hydroxyphenylacetate is likely to be metabolized by the homogentisate pathway in C. necator, thus involving hydroxylation of the aromatic ring at C-1 with a concomitant migration of the carboxymethyl side chain to C-2 (the ‘NIH’ shift reaction), catalysed by a NADH-dependent 4-hydroxyphenylacetate-1-hydroxylase, for which, however, no sequence data are available (Hareland et al., 1975) (see Fig. 1).
In the course of a genomic in silico search for novel aromatic ring-cleavage dioxygenases in P. putida KT2440, a gene could be identified, the product of which showed significant similarity to protocatechuate 4,5-dioxygenases, suggesting that it could be involved in the meta cleavage of a catecholic compound, a type of reaction that had not been reported yet in P. putida KT2440 (Nogales et al., 2005). Substrate screening of the overexpressed extradiol dioxygenase identified it as a gallate dioxygenase (Fig. 1) being the prototype of a new subgroup of type II extradiol dioxygenases that shares a common ancestor with protocatechuate 4,5-dioxygenases and whose two-domain architecture might have evolved from the fusion of the large and small subunits of the latter. Gallate dioxygenases were recently identified in 22 out of 822 genomes analysed for the distribution of aromatic catabolic properties, being as abundantly distributed as protocatechuate 4,5-dioxygenases (Perez-Pantoja et al., 2009). However, the respective genes are typically annotated as protocatechuate 4,5-dioxygenases, even though at least the P. putida KT2440 gene product does not exhibit such activity.
Genome in silico analysis also led to the identification of a gene cluster involved in nicotinic acid degradation in P. putida KT2440 (Jimenez et al., 2008) being the first complete set of genes identified encoding degradation of this compound. Also, novel knowledge on the degradation of gentisate, a key intermediate in the degradation of many aromatic compounds such as salicylate or 3-hydroxybenzoate, could be generated through genome mining. In the gentisate pathway, gentisate 1,2-dioxygenase, a member of the cupin superfamily, cleaves the aromatic ring between the carboxyl substituent and the proximal hydroxyl group to yield maleylpyruvate (Crawford et al., 1975). Isomerization of maleylpyruvate to fumarylpyruvate is catalysed by either a glutathione (GSH)-dependent maleylpyruvate isomerase almost exclusively found in Gram-negative bacteria (Crawford et al., 1975), or a GSH-independent maleylpyruvate isomerase that has been characterized in various Gram-positive bacteria (Crawford and Frick, 1977). Mining the genome of Corynebacterium glutamicum resulted in the first identification of genes involved in the GSH-independent pathway, which were observed to be encoded in the same catabolic gene cluster as is gentisate dioxygenase (Shen et al., 2005). Genome mining to discover and exploit novel enzymes also targeted, among others, Baeyer–Villiger monooxygenases (BVMOs), leading to the discovery of the first thermostable enzyme of this group (Fraaije et al., 2005). Interestingly, the sequenced genome of R. jostii RHA1 encoded 23 putative BVMOs out of which 13 could be heterologously expressed showing a remarkable diversity of both regio- and enantioselectivity (Szolkowy et al., 2009).
Insights into the metabolism at the organism-wide level
Genomics studies could allow a reconstruction of metabolic pathways relevant for biodegradation of xenobiotics, providing a holistic (or systems) view on the metabolic network of a particular organism. Quite importantly, among current bottlenecks in genome analysis the lack of knowledge and insufficient efforts on enzymology and amplifying annotation mistakes in databases are of greatest hindrance for functional reconstruction.
Overall, it is evident that a large proportion of the ORFs of newly sequenced genomes have little sequence homology with known enzymes, so their potential activities remain hidden. There is, however, an increasing number of methods for predicting protein function from sequence or structural data (for a recent review, see Lee et al., 2007). Even though annotation strategies have become more sophisticated in recent years (Rentzsch and Orengo, 2009), it needs to be noted that the majority of protein sequences in public databases have not been experimentally characterized and the most common approach in use today continues to be the assignment of molecular function from the inference of homology followed by annotation transfer (Schnoes et al., 2009). Recently, the misannotation levels for molecular function in public protein sequence databases was investigated for a model set of 37 enzyme families for which extensive experimental information was available (Schnoes et al., 2009). The authors observed surprisingly high levels of misannotation of up to > 80% for some of the subfamilies studied, mainly associated with ‘overprediction’ of molecular function and an increase in misannotations from 1993 to 2005. Thus, they stated that misannotation in enzyme superfamilies containing multiple families that catalyse different reactions is a larger problem than has been recognized. The same problem holds also when considering aromatic degradation reactions. As an example, gallate dioxygenases mentioned above as observed in 22 out of > 800 genome sequencing projects are typically annotated as protocatechuate 4,5-dioxygenases. A phylogenomic approach was recently used to analyse for the presence of aromatic degradative pathway in sequenced genomes (Perez-Pantoja et al., 2009) which provides clues on the distribution of catabolic properties among bacterial phyla and on the ecological functions of specific bacterial groups, defines underscored research objectives and gives a better overview of the genetic basis of bacterial catabolism of aromatics. The phylogenomic approach to study the organization of aromatic degradation was based on the selection of sequences of key catabolic functions derived from both biochemically and genetically well-studied systems to fish into the sequenced genome databases, followed by refinement of the positive scores and identified a huge set of misannotations in public databases.
The whole-genome sequences have not only enabled us to identify genes and to predict (and possibly confirm) their biological roles through homologous comparisons, but to also identify novel biocatalysts. In concert with proteomic and transcriptomic information, new insights into the metabolism at the organism-wide level could be obtained. As an example, metabolic, genomic and proteomic approaches were used to construct a complete and integrated pathway for pyrene degradation in M. vanbaalenii PYR-1 (Kim et al., 2007). However, not only could the metabolic pathway be defined and proteins involved identified, but also differences in the metabolism of different PAHs revealed and thus suggestions for the involvement of additional candidate genes in the complex network of PAH (polycyclic aromatic hydrocarbon) metabolism made (Kim et al., 2008). Moreover, a large number of determinants associated with protection against PAH substrates and metabolites were observed, comprising, for example, a catechol-O-methyltransferase (Kim et al., 2008). Putative detoxification mechanisms were also revealed during analysis of the transcriptome of B. xenovorans LB400 (Parnell et al., 2006). Transcriptomic and proteomic studies also revealed insights into the role of benzoate-catabolic pathway redundancy with the so-called box-pathway responsible for degradation of benzoate via benzoyl-CoA being preferentially expressed under reduced oxygen concentrations, thus, relating this redundancy to possible adaptations to different environmental conditions (Denef et al., 2005; 2006) and capabilities of bacteria to deal with oxidative stress generated during the metabolism of aromatics (Agullo et al., 2007). Progress has also now been made to unravel and understand full bacterial genome regulatory networks and pollutant physiology under conditions of environmental stresses, to suggest experimental ways for limiting stress effects while maintaining bacteria catabolic efficiency. As an example, fluctuation in water availability is a fundamental stress challenging soil-residing microorganisms, and desiccation tolerance is a key adaptation of many such organisms. Factors contributing to the desiccation resistance in the versatile biodegrader R. jostii RHA1 were recently identified, comprising the biosynthetic pathway of a compatible solute (LeBlanc et al., 2008). Synthesis of compatible solutes, protection from oxidative damage, transcriptional regulation and cell envelope modification seem to be common mechanisms to deal with desiccation stress (Katoh et al., 2004; Cytryn et al., 2007).
To fully understand how bacteria respond to their environment, it is clearly essential to assess genome-wide transcriptional activity. New high-throughput sequencing technologies such as the 454 GS FLX (Roche) or the Genome Analyser (Illumina) make it possible to query the transcriptome of an organism in an efficient unbiased manner (Sorek and Cossart, 2010). This method termed RNA-Seq (RNA sequencing, or better sequencing of cDNA fragments) has initially been applied to the analysis of eukaryotic transcriptomes (Wang et al., 2009). In fact, mRNA enrichment is more challenging in prokaryotes, as prokaryotic mRNAs lack the 3-end poly(A) tail of mRNAs in eukaryotes and as the majority of cellular RNA is composed of ribosomal RNA and tRNA, such that transcriptome sequencing of non-enriched total RNA would yield mostly non-mRNA sequences (Sorek and Cossart, 2010). Recently, with the application of methods such as the artificial polyadenylation of mRNA (Frias-Lopez et al., 2008) and the depletion of processed RNA (rRNA and tRNA), RNA-Seq has been extended to the study of microbes (Yoder-Himes et al., 2009; Filiatrault et al., 2010). Importantly, all of these studies show that the bacterial transcriptome is significantly more complex than previously thought and revealed the presence of a huge set of non-coding RNAs (ncRNAs), novel untranslated regulatory elements and alternative operon structures (Sorek and Cossart, 2010).
On-site catabolic gene expression
As the detection of functional genes provides information on the presence of organisms harbouring the respective genes at a site and possibly on a selective advantage for the host to harbour such catabolic genes, functional gene abundance does not directly reflect metabolic activity. To document expression of specific genes, analysis of mRNA is applied. As with pure culture studies, initial studies typically concentrated on documenting transcription of specific target genes such as naphthalene dioxygenase encoding genes (Wilson et al., 1999; Yagi and Madsen, 2009). Until recently, the mRNA approach was hampered by low yields of mRNA retrieved from environmental samples and its rapid decay. However, novel methodological developments such as those described above for pure culture studies, and specifically the T7-RNA-polymerase-based RNA amplification (originally introduced by Van Gelder et al., 1990) where A-tailed RNA is reverse-transcribed primed with an oligo(dT) primer containing a T7 promoter sequence allowing a 1000-fold unbiased amplification makes metatrancriptomic studies now feasible (Frias-Lopez et al., 2008). Optimized mRNA extraction, purification and amplification protocols were recently used to analyse a small library of cDNA clones of a crude oil-degrading marine microbial community making evident not only the expected expression of genes related to the biodegradation of fatty acids but also of those involved in the biosynthesis of glycolipids probably involved in emulsification of crude oil (Kato and Watanabe, 2009). They were also used in concert with pyrosequencing to analyse complex microbial communities such as the oceans water column revealing, among others an impressive array of novel ncRNAs, some of which were suggested as regulators for carbon metabolism and energy production (Shi et al., 2009). Clearly metatranscriptomic analyses will improve our knowledge on the expressed subset of metagenomic DNA and on functioning of and interactions among members of microbial communities. New sequencing technologies, such as the upcoming nucleic acids single true molecule sequencing, especially those non-fluorescent or Raman-based methods (Treffer and Deckert, 2010), will, without doubt, not only allow the analysis of larger fractions of the metagenome and metatranscriptome, but also, for example in parallel with microarray analyses or gene family-targeted mRNA pyrosequencing, a better understanding of the functional interactions in biodegradation and bioremediation.
Any rational effort to interfere with microbial processes in order to optimize metabolic performance on site has to deal with the enormous complexity of the system. Fortunately, new technological developments and conceptual frameworks provide new approaches to explore complex biological settings, allowing us to move towards a picture of the complete catalytic potential and the metabolic net of the bacterial communities that thrive in polluted sites.
The speed and depth by which ecosystem functioning can be described is heavily influenced by new technical developments. Specifically in DNA sequencing technologies, the impact of 454, Illumina and the forthcoming single-molecule sequencing platforms will change once more the scale and depth of explorations of microbial communities. Some approaches that were previously technically impossible are now plausible, such as obtaining the complete genome sequence of a single bacterial cell, by using cell separation methods (Vives-Rego et al., 2000) and isothermal amplifications of the genomic DNA contained in one single cell (Woyke et al., 2009). The single bacterial cell analysis is also being developed for other cellular components such as metabolites and proteins (Burg et al., 2007; Borland et al., 2008) and we can foresee its application to analyse catabolic potential or activity against aromatics of specific cell groups from microbial communities in bioremediation treatments. Technical advances in metabolomics (Hirai et al., 2004; Giavalisco et al., 2008; Iijima et al., 2008) allow the correlation with expression profiles and, of course, genomic content (Hirai et al., 2005). The potential of these metabolomic analyses can be foreseen when adapted to assess microbial community biodegradation performance, integrating more and more biological processes, for example, analyses of metabolite fluxes, of pathway bottlenecks, determine how communities cope with stress and how they adapt to changing environments.
To achieve a closer description of the catabolic network and its components and to gain the potential for modelling of the environmental selectors to be able to predict ecosystem behaviour, it is necessary to integrate experimental information in a high-throughput manner (Trigo et al., 2009). In fact, the huge set of information collected from the analysis of different descriptors of microbial community functioning will require novel ways to organize data and extract meaningful conclusions. There is an urgent need to build expanded custom metabolic networks covering all described pathways for target pollutants. This will require a carefully curated framework to define catabolic genes in a much more precise way than the current automatic genomic annotation. Community descriptions will also require the use of tools from the nascent systems biology field (Fisher and Piterman, 2010; Gehlenborg et al., 2010; Liu et al., 2010) which rely on computational biology and visualization tools to be able to define the phylogenetic composition and shifts, the functions selected or expressed, their association with certain metabolic steps, the metabolites fluxes and the comparison of the observed patterns between samples.
Hydrocarbon contamination in environmental setups may be regarded as a large evolutionary metabolic model suited to study the effects of strong selectors on complex microbial populations and the catabolic landscape (de Lorenzo, 2008). We do not have yet enough experiments comparing, under controlled conditions, the catabolic/taxonomic or network responses of samples from diverse biogeographic origins challenged by the same pollution or selector. By analysing the ecology of biodegradation, we may add experimental information on how from the proposed homogenous and ubiquitous presence of all bacterial types (De Wit and Bouvier, 2006) the apparently extremely high diversity of bacterial community composition on Earth developed (Sogin et al., 2006). Such analysis will help us to understand basic aspects of functional selection and microbial diversity, and how predictable such behaviour is, dependent on the origin of a site to bioremediate, and dependent on the abiotic factors quantified. Such systems understanding will open new ways to improve sustainable use of our environment.
We would like to thank former and current members of the Microbial Interactions and Processes Research Group, HZI – Helmholtz Centre for Infection Research (previously known as AG Biodegradation at the German Research Centre for Biotechnology, Braunschweig) for all their help and support during the last years, in our quest to improve the understanding of the ecology of microbial aromatic biodegradation. Research within the authors' laboratories was funded by the projects BIOTOOL, BACSIN and MAGICPAH from the European Commission.