Correspondence: Robert M. Kelly, Department of Chemical and Biomolecular Engineering, North Carolina State University, EB-1, 911 Partners Way, Raleigh, NC 27695-7905, USA. Tel.: 919 515 6396; fax: +919 515 3465; e-mail: firstname.lastname@example.org
High-throughput sequencing of microbial genomes has allowed the application of functional genomics methods to species lacking well-developed genetic systems. For the model hyperthermophile Thermotoga maritima, microarrays have been used in comparative genomic hybridization studies to investigate diversity among Thermotoga species. Transcriptional data have assisted in prediction of pathways for carbohydrate utilization, iron–sulfur cluster synthesis and repair, expolysaccharide formation, and quorum sensing. Structural genomics efforts aimed at the T. maritima proteome have yielded hundreds of high-resolution datasets and predicted functions for uncharacterized proteins. The information gained from genomics studies will be particularly useful for developing new biotechnology applications for T. maritima enzymes.
One recent estimate places the number of partially sequenced microbial genomes at over 760, with more than 260 currently completed (source: http://www.GenomesOnline.org). As genome sequencing becomes more accessible, and metagenomics approaches reveal the genetic content of uncultured microbial communities, a variety of tools and approaches will be needed to explore the biochemical, metabolic, physiological, ecological and biotechnological content of microbial genomes. To this end, discussed here will be insights into the diversity within the genus Thermotoga obtained through comparative genomics, developments to date in structural genomics of T. maritima, and physiological information gained through functional genomics studies (Table 1).
Table 1. Expression-based functional genomics studies completed or in progress for Thermotoga maritima
Batch growth, maltose and cellobiose, with and without sulfur; continuous culture in a chemostat, cellobiose, maltose, and cellobiose-maltose mixture
Batch growth, at baseline and five subsequent time points (0, 5, 30, 60, 90)
Batch growth, chloramphenicol-resistant mutant during growth phases; continuous culture of mutant compared to wild type
Genomics of Thermotoga species
Insights from the genome sequence of the Thermotoga model strain: T. maritima MSB8
The genome of the hyperthermophilic bacterium T. maritima strain MSB8 was sequenced in 1999 (Nelson et al., 1999). At the time, the 1 860 725 bp circular chromosome (G+C content: 46%) was predicted to contain 1877 coding regions, 1014 (54%) of which were given functional assignments and 863 (46%) of which were of unknown function (Nelson et al., 1999). All but 5% of the 1.86 Mb genome was predicted to be covered by ORFs.
There were many reasons to sequence the genome of T. maritima MSB8. Among them was the ability of this organism to grow at an optimum temperature of 80°C and to metabolize many simple and complex carbohydrates, including glucose, sucrose, starch, cellulose and xylan (Huber et al., 1986). Both cellulose and xylan, through conversion to fuels (such as hydrogen), have great potential as renewable carbon and energy sources (Van Ooteghem et al., 2002). Clues to understanding the unique metabolic properties of this organism were revealed by the genome sequence; c. 7% of the T. maritima genes were predicted to be involved in carbohydrate utilization, breakdown and metabolism, consistent with observed growth of T. maritima on various sugars (Nelson et al., 1999).
The T. maritima genome sequencing project was also motivated by the evolutionary relationship of this hyperthermophile with other microbial species. Comparative genomic analysis provided extensive evidence for LGT events, with c. 24% of the T. maritima ORFs having their best matches to genes from archaeal species. Many of these genes had atypical nucleotide composition and were often found in clusters (termed ‘archaeal islands’): 81 archaeal-like genes were found to be clustered in 15 chromosomal regions, with sizes ranging from 4 to 20 kb. Furthermore, conservation of gene order in many of the clustered regions suggested that LGT had occurred between T. maritima and archaeal species. Possibly due to the extensive LGT with archaeal species, the MSB8 genome sequence did not unequivocally clarify evolutionary relationships of Thermotoga species within the microbial world. Examination of a subset of 33 genes conserved among all known microbial species did not support the phylogenetic position of T. maritima as proposed by 16S rRNA gene phylogeny. Additional support for massive LGT came from the periodicity of the genome, compared to other bacteria and archaea (Worning et al., 2000).
The T. maritima genome sequence did not reveal any obvious mechanisms for LGT, such as mobile elements in the form of intact transposons or phages that may have conveyed ‘archaeal-like’ sequences. Competence has not been demonstrated in T. maritima, but various type II secretion pathway proteins and type IV pilin-related proteins that function in natural competence in other bacterial species could be identified in the genome (Nelson et al., 1999). In addition, homologs of various competence genes were evident, including dprA, comM, and comE of Haemophilus influenzae, suggesting that there may be an inherent (but yet to be identified) system for the uptake of exogenous DNA by T. maritima, thereby facilitating exchange of DNA with other organisms. A direct consequence of this is the current lack of effective tools, such as vectors and knock-out systems, needed to manipulate the genetics of Thermotoga species. A plasmid vector has been developed for Thermotoga based on a cryptic miniplasmid fused with the Escherichia coli plasmid pBluescript and the heat-stable Staphylococcus aureus chloramphenicol acetyltransferase (Harriott et al., 1994; Yu et al., 2001). Antimetabolite resistant and auxotrophic mutants of Thermotoga neapolitana have been reported, as has a defined minimal medium that will support growth of this organism (Noll & Vargas, 1997). However, a versatile and robust genetic system remains an important need for studies of Thermotoga biology.
The original T. martima genome annotation was subsequently re-evaluated (Kyrpides et al., 2000) indicating agreement with the original annotation in 895 cases, 29 discordant annotations, and 863 remaining hypothetical proteins. Examination of the conflicting annotation predictions using tools such as the psi-blast algorithm (http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-2.html), presence of PRO-SITE patterns (http://au.expasy.org/prosite/), matches to the protein family databases Pfam (http://www.sanger.ac.uk/Software/Pfam/) and COGs (http://www.ncbi.nlm.nih.gov/COG/), revealed 163 new functional assignments and 29 amendments to functions predicted in the original genome sequence report (Nelson et al., 1999). Of note was the prediction of a rhamnose-rhamnulose utilization cluster (TM1071–TM1073) and confirmation of additional homologs for large protein families, including ABC transporters, as well as identification of proteins with cystathionine β synthase (CBS), transmembrane and zinc-finger domains. Also identified were several instances of fusion proteins, e.g. the annotated chorismate mutase from the tyrosine and phenylalanine biosynthesis pathway was a fusion protein consisting of both chorismate mutase and phospho-2-dehydro-3-deoxyheptonate-aldolase domains. In addition to bioinformatics reanalysis, biochemical studies of a number of enzymes from T. maritima have been undertaken since the publication of the MSB8 genome sequence, clarifying or confirming predicted functions (Table 2). Functional and structural genomics studies currently underway will no doubt further improve T. maritima genome annotation and, perhaps, provide information useful for annotating other microbial genomes.
Table 2. Selected Thermotoga maritima genes and proteins characterized by functional genomics or biochemisty (1999 to January 2006)
Genetic and phylogenetic studies in the Thermotoga lineage
LGT in the Thermotoga lineage has also been investigated in a series of studies by Nesbo and colleagues (Nesbo et al., 2001, 2002, 2006; Nesbo & Doolittle, 2003). In the first of these studies, the patterns of acquisition of two predicted ‘archaeal-like’ genes, glutamate synthase large subunit (gltB, TM0397) and myo-inositol-1-P synthase (inol1, TM1419) were investigated (Nesbo et al., 2001). Amplification of inol1 and gltB was attempted in 15 Thermotogales species, but was only successful for species within the Thermotoga lineage. Phylogenetic analysis of the Thermotoga homologs with those of other species suggested the possibility of several independent transfers of the archaeal gltB genes into bacterial species. The presence in Thermotoga of three ORFs with regions of similarity to bacterial gltB, a feature characteristic of Archaea, supports this hypothesis. The gene inol1, which is found only in hyperthermophiles, is known to be used in the production of the osmolyte and supposed thermoprotectant DIP (di-myo-inositol-1,1-P). The pattern of amplification of inol1 along the species of the Thermotogales examined was consistent with the distribution of DIP among Thermotoga species, as previously determined (Martins et al., 1996). Here, intraspecific intragenic recombination was proposed to explain the mosaic nature of the inol1 homolog of Thermotoga sp. strain RQ2. Phylogenetic analyses also suggested that all inol1 homologs present in bacteria likely resulted from LGT with archaea.
In a broader study, Nesbo and workers used suppressive subtractive hybridization (SSH) to compare the gene content of T. maritima MSB8 to that of Thermotoga sp. strain RQ2, which was isolated from the geothermally heated sea floor in Ribiera Quente, the Azores, and whose 16S rRNA gene sequence is almost identical (∼99.7%) to that of strain MSB8 (Nesbo et al., 2002). Over 300 unique RQ2-specific sequences were obtained from clones generated by SSH, and it was estimated that 20% of the RQ2 genome was not present in the genome of strain MSB8. Genes with highest similarity to archaeal genes as well as genes most closely related to homologs in distant bacterial genomes were found, suggesting LGT with species from both domains of life. Consistent with this hypothesis, RQ2-specific genes appeared in gene clusters missing from strain MSB8. Ratios of synonymous to nonsynonymous mutations in the divergent genes did not suggest positive selection but rather LGT followed by random mutations. Homology searches among RQ2-specific clones revealed numerous genes related to carbohydrate processing and ABC transport. These included carbohydrate active enzymes (e.g. arabinosidases, xylosidases) as well as ABC transporter subunits. Numerous clones encoding sugar ABC transporter subunits identified in strain RQ2 by SSH were found to be related to, but divergent from, homologs in strain MSB8 (<85% identity). It was suggested that the expansion of sugar transporters in strain RQ2 may represent lineage-specific gene expansion, possibly used as a strategy for environmental adaptation. Variation in genes related to surface structure formation was observed between the two species, including the acquisition of a set of rhamnose biosynthesis proteins by RQ2. Based on Southern blots, it was proposed that other Thermotoga strains shared scattered RQ2-specific sequences in patterns, again suggestive of LGT. RQ2-specific probes, derived from the SSH studies described above, were then used by Nesbo & Doolittle (2003) to screen lambda libraries created from five RQ2 regions that are absent from the MSB8 genome. Among the gene clusters found to be unique to strain RQ2 were an archaeal-type ATPase, the rhamnose biosynthesis operon mentioned above, and an arabinosidase island. Again, both the phylogenetic patterns and G+C content suggested the acquisition of these RQ2 specific genes through LGT.
In a recent study, Nesbo et al. (2006) detected recombination events between sequences from nine Thermotoga species, in some cases between Thermotoga species distantly related on the Thermotoga rRNA tree. By examining DNA sequences generated from rRNA-containing fosmid clones and available genomic sequences, a number of highly identical regions were detected indicating recent recombination events. The authors note that the finding of identical or nearly identical DNA sequences within distantly related Thermotoga species blurs the genetic definition of a species. Indeed, as the result of such transfers, different parts of one biological species' genome may be capable of recombining with sequences from distantly related lineages. For example, a likely transfer of a large genomic region from a T. maritima-related strain to the T. neapolitana lineage could permit further recombination between these lineages within the shared region, although lower average nucleotide identity of other regions of the genome might prevent recombination. The authors suggest that due to the extent of genetic exchange within the Thermotogales, traditional definitions of species may be problematic, and therefore questions about the genetic adaptation of Thermotoga species to different habitats may best be explored with global genomic approaches, such as metagenomics of a particular environment or comparative genomics of multiple strains isolated from different locales, rather than by examination of individual species.
Genetic diversity among the Thermotogales
After the completion of the strain MSB8 genome, additional sequences from Thermotoga species have been made available in Genbank, including a whole-genome shotgun draft sequence from T. neapolitana strain NS-E (Genbank accession NC_06811; Nelson et al., in preparation), and selected sequences from Thermotoga napthophila, Thermotoga petrophila, and Thermotoga species KOL6, RQ2, RQ7, SL7, and Thermotoga maritima FjSS3-B.1. Many of the deposited sequences from other Thermotoga species are homologous to MSB8 genes, including enzymes active against various carbohydrates (Simpson et al., 1991; Ruttersmith & Daniel, 1993; Saul et al., 1995). Plasmid sequences from T. petrophila and sequences generated by SSH studies with Thermotoga strains MSB8 and RQ2 (Nesbo et al., 2002) are among those sequences not shared with the MSB8 genome.
Using comparative genomic hybridization (CGH), Mongodin et al. (2005) examined the genetic differences among nine Thermotoga species, including strain RQ2, in comparison to the sequenced T. maritima strain MSB8 (Fig. 1). Patterns of shared and species-specific sequences between the different species were not restricted by 16S rRNA gene sequence relationships. Subsequent DNA sequencing of the RQ2-specific islands was performed, in order to differentiate between divergent sequences and absence of genes in one strain compared to the reference, both of which events could not be distinguished by CGH using the MSB8 array. Large acquisitions, not found in strain RQ2, were apparent in strain MSB8, including an apparent rhamnose utilization locus (TM1063–TM1071) (Conners et al., 2005), putative myo-inositol utilization genes (TM0411–TM0423) including a substrate binding protein recently shown to bind myo-inositol (Nanavati et al., 2006), and a region containing phosphate transport genes (TM1261–TM1271). Sequencing of other regions, which displayed poor hybridization with T. maritima probes, revealed recombination events within genes. In some cases, these resulted in homologous genes retaining conserved N- and C-termini separated by strain-specific intervening protein sequences. Additionally, some genes, which were not detected by CGH, were indeed present but displayed high divergence compared to probe sequences.
Many of the genes found to be variable in their presence in the different Thermotoga species from the CGH study of Mongodin et al. (2005) seem to be related to substrate utilization, and reflect the importance of LGT events in enabling a quick evolution and adaptation to specific habitats. For example, a large number of MSB8 carbohydrate utilization loci were found to be lacking in Thermotoga species PB1platt, isolated from an oil field where plant polymers are scarce. Multiple paralogous sugar transport systems and carbohydrate active enzymes have been identified in RQ2 by SSH with MSB8 (Nesbo & Doolittle, 2003). As a result of these findings, it is clear that the details of carbohydrate utilization patterns in MSB8 that are gleaned from functional studies cannot be automatically extended to all Thermotoga species and strains. However, these initial studies have been beneficial in elucidating the pathways by which T. maritima MSB8 hydrolyzes, transports, and utilizes a variety of substrates.
CRISPR elements in the Thermotogales
Analysis of the T. maritima genome revealed the presence of clustered regularly interspaced short palindromic repeats (CRISPRs) in eight distinct loci on the chromosome. CRISPR elements have a remarkable structure that consists of a 30 bp repeat element interspersed with a variable and non-repetitive 39 to 40 bp sequence called the ‘spacer’. CRISPRs are thought to increase in size by duplicating the repeat sequence and adding new spacer sequences by a mechanism that is still not known. Although the origin of the spacers remain elusive (Mojica et al., 2005), they can be used to identify relatedness between strains (DeBoy et al., 2006). CRISPR elements have been identified in a broad range of microbial species, including Salmonella enterica serovar Typhimurium, Streptococcus pyogenes, Mycobacterium tuberculosis and Campylobacter jejuni (Mojica et al., 2000; Jansen et al., 2002; Schouls et al., 2003; Pourcel et al., 2005). The unique structure of CRISPRs and their association with a group of conserved genes (called cas genes, for ‘CRISPR-associated sequences’), which are potentially involved in DNA recombination and repair (Makarova et al., 2002), provide additional clues for an active role of CRISPR elements in the mobilization of DNA, and could play an active role in LGT in T. maritima. In a recent study by Mongodin and colleagues, CRISPR regions were found to vary in length and in number of repeats between different Thermotoga species. CRISPR repeats have also been found flanking inversion sites detected by comparison of the whole genome sequences of T. maritima MSB8 and T. neapolitana NS-E (K.E. Nelson et al, manuscript in preparation; DeBoy et al., 2006).
Insights into Thermotoga physiology revealed by biochemical and functional genomic efforts
Carbohydrate utilization pathways of Thermotoga species
Important functional information about sugar utilization pathways used by T. maritima has been gained from both biochemical characterization of carbohydrate active enzymes and expression-based work. By combining data from these studies with comparative genomics analyses, many details of T. maritima carbohydrate processing pathways can be predicted. However, there is still much to learn about the local and global mechanisms of regulation of these pathways. The functions and specificities of members of the LacI and XylR families of carbohydrate-responsive regulators in T. maritima have yet to be determined biochemically. No mechanism of carbon catabolite repression (CCR) has yet been defined for T. maritima, although catabolite repression of lactose utilization genes has been demonstrated in the presence of glucose in T. neapolitana (Vargas & Noll, 1996). However, glucose has not been shown to be a preferred substrate for any Thermotoga species, perhaps due to its thermolability at high temperatures. In fact, growth of T. maritima on glucose has been observed to be slower than growth on other polysaccharides (Chhabra et al., 2003). Neither the general (HPr, EI) nor sugar-specific (EII) components of a phosphotransferase system (PTS) are apparent in T. maritima or T. neapolitana species (Galperin et al., 1996; Nelson et al., 1999). Rather, uptake of most carbohydrates, and many other substances, is likely accomplished mainly via binding protein dependent ABC transporters (Nelson et al., 1999; Nanavati et al., 2002, 2005, 2006) (Table 3). Multiple homologs from known families of bacterial sugar transporters (e.g. CUT1 and CUT2) (Schneider, 2001) have been identified in Thermotoga species (Nelson et al., 1999, 2002). A number of oligopeptide/dipeptide family ABC transporters found in T. maritima are closely related to archaeal homologs shown to act as sugar transporters (Elferink et al., 2001; Koning et al., 2001), and have been shown to bind various sugar substrates (Nanavati et al., 2006), consistent with the proximity of many of these genes to glycoside hydrolases and members of carbohydrate-responsive regulator families.
Table 3. ABC transport systems of T. maritima
Family and predicted substrate
Source(s) or reference(s)
Permanent designation based on verified binding specificity of substrate binding protein in Nanavati et al. (2006).
Temporary designation pending biochemical verification; based on transcriptional data in Conners et al. (2005).
Permanent designation based on verified binding specificity of substrate binding protein in Nanavati et al. (2005).
In the absence of a PTS, T. maritima must perform specific hydrolysis, phosphorylation, and/or isomerization steps to use each imported sugar. Free phosphate (e.g. cellobiose phosphorylase) or polyphosphate (e.g. PPi-dependent PFK) could be used to generate phosphorylated sugars in some cases. In other cases ATP is required; for example, glucose is phosphorylated via ATP hydrolysis by an ATP-dependent glucokinase (Hansen & Schonheit, 2003). Since ATP is also used by different ABC transporters to import substrates of varying chain lengths, the number of sugar molecules acquired per molecule of ATP hydrolyzed may also vary by substrate. It remains to be seen how these differing ATP requirements affect carbohydrate metabolism and bioenergetics.
α-Linked glucans oligo- and poly-saccharides
The most exhaustively studied pathways for polysaccharide utilization in T. maritima involve the hydrolysis of α-linked polysaccharides and disaccharides. Genomic and biochemical data suggest that T. maritima can use the α-1,4 linked glucose disaccharide maltose as well as pullulan and starch, which contain mixed α-1,4 and α-1,6 linkages (Bibel et al., 1998; Chhabra et al., 2003). Reconstruction of the pathways, and biochemical characterization of the enzymes used by T. maritima MSB8, to break down α-linked polysaccharides in some cases preceded the sequencing of its genome (Schumann et al., 1991; Liebl et al., 1997; Bibel et al., 1998). Using data from these functional studies, pathways for the uptake and utilization of various α-linked glucans sugars can be inferred (Fig. 2).
In the cases of both pullulan and starch, extracellular enzymes perform the reactions necessary to break down the complex polysaccharides for transport into the cell. De-branching of α-1,6 linkages in pullulan and starch is likely accomplished by the type I pullulanase Pul13A (TM1845) (Kriegshauser & Liebl, 2000), which yields mainly maltotriose as a product (Bibel et al., 1998). Recent studies of this enzyme revealed a novel N-terminal domain (TmCBM41) which binds α-1,4 linked glucans and α-1,4 glucans with occasional α1,6 linkages (Lammerts van Bueren et al., 2004). Amy13A (TM1840), a Ca2+-requiring membrane-bound α-amylase, hydrolyzes extracellular starch, amylase and amylopectin, but has lowered activity towards the more highly branched polysaccharides glycogen and pullulan (Schumann et al., 1991; Liebl et al., 1997).
Binding of hydrolyzed products of α-glucans and maltose for transport into T. maritima MSB8 cells apparently involves at least two different maltose ABC binding proteins. The maltose binding protein encoded by TM1839 (TMMBP, malE2) has been characterized both functionally and structurally, and binds maltose, maltotriose, and trehalose (Wassenberg et al., 2000; Nanavati et al., 2002, 2005). Transcripts of TM1839 were observed to be more than threefold higher during growth on maltose than glucose (Nguyen et al., 2004). TM1839 colocalizes in the genome with two ABC permease subunits (TM1836, TM1838), the family 4 α-glucosidase AglA (TM1834) (Raasch et al., 2000), and a cyclomaltodextrinase (TMG, TM1835) (Lee et al., 2002b). A gene string divergently transcribed from the maltose and trehalose transport locus encodes the lipoprotein α-amylase Amy13A, pullulanase Pul13A and hypothetical proteins which have sequence similarity to putative α-glucan processing enzymes (Nelson et al., 1999). The T. neapolitana locus is organized similarly, but several additional inserted genes are present, further underscoring the genome plasticity observed in Thermotoga species (Berezina et al., 2003).
The high identity between TMMBP and the protein encoded by TM1204 (79% id/391 aa) may suggest possible recent duplication in the Thermotoga lineage (Nanavati et al., 2005). While the second MBP homolog (TM1204, malE1) was not differentially expressed during growth on maltose as compared to glucose, large differences in transcript levels during growth on lactose as compared to glucose were observed (>5-fold) (Nguyen et al., 2004). In addition to binding maltose and maltotriose, TM1204 has been shown to bind β-1,4-mannotetraose, suggesting divergence of the two maltose transporters to fulfill different transport capabilities (Nanavati et al., 2005). Consistent with previous findings, we have observed no differential expression of the TM1202–TM1204 transporter genes during growth on starch (S.B. Conners and R.M. Kelly, unpublished data). The proximity of this transporter to lactose/galactose utilization genes (TM1190–TM1199) (Liebl et al., 1998b; Kim et al., 2004), a LacI family regulator (TM1200) and putative arabinogalactan endo-1,4-β-galactosidase (TM1201), suggests a possible role in the uptake of mixed sugar oligosaccharides.
Both sets of characterized maltose transport proteins from T. maritima lack ATP-binding subunits, as do several other sets of CUT1 transporters found in the genome. We have observed increased transcription of a MalK ATP-binding subunit (TM1276) during growth on maltose, and presume it may interact with both maltose transporters and, perhaps, other related sets of permeases and substrate binding proteins. Interaction of the same ATP-binding subunit with separate ABC permeases and substrate binding proteins for maltose and cellobiose has been observed in Streptomyces olivaceoviridis (Schlosser et al., 1997).
Following transport into the cell, maltose hydrolysis in T. maritima cells likely involves the α-glucosidase AglA (TM1834), which is active on maltose and maltotriose but not starch, amylopectin or amylose (Bibel et al., 1998). AglA requires reducing conditions, NAD+, and Mn2+ for activity (Raasch et al., 2000, 2002; Lodge et al., 2003). Two family 4 glycoside hydrolases, initially annotated as α-glucosidases, have been shown to be α-glucuronidases (Suresh et al., 2002, 2003). Longer malto-oligosaccharides may be hydrolyzed by a CDase (TM1835) active on malto-oligosaccharides of three to six glucose units and cyclodextrins, which yields mainly glucose and maltose as products (Lee et al., 2002a). The recently characterized intracellular α-amylase Amy57C appears to have an activity complementary to that of α-glucosidase, hydrolyzing amylose and soluble starch, but displaying poor activity towards glycogen and pullulan (Ballschmiter et al., 2006; Dickmanns et al., 2006). The hydrolysis of longer intracellular α-1,4 glucans likely also involves Amy13B (TM1650), an intracellular α-amylase active on starch (Schumann et al., 1991; Lim et al., 2003).
An additional pathway for the utilization of maltodextrins may exist in some Thermotoga species, although its operation in MSB8 has not been confirmed. This alternative pathway might use 4-α-glucanotransferase and maltodextrin phosphorylase activities to hydrolyze maltose or malto-oligosaccharides. Escherichia coli 4-α-glucanotransferase (MalQ) cannot use maltose as a sole substrate, but uses maltose and glucose as acceptors for endogenously produced maltotriose and higher maltodextrins (Pugsley & Dubreuil, 1988). Maltodextrin phosphorylase (MalP) then hydrolyzes the resulting maltodextrins to glucose-1-phosphate using free phosphate (Palm et al., 1987). Two intracellular T. maritima proteins with 4-α-glucanotransferase activity have been characterized. The 4-α-glucanotransferase activity identified by Liebl et al. (1992) can be attributed to TM0364 (MgtA). This enzyme transfers α-1,4 linked glucanosyl segments from starch, amylose, amylopectin, and malto-oligosaccharides M3 and longer, but can not use maltose and maltotriose as sole substrates. TM0364 has been characterized structurally (Roujeinikova et al., 2001b, 2002). The family 13 maltosyltransferase (MTase; mmtA, TM0767) characterized by Meissner and Liebl transfers only maltose units from α-1,4 glucans to maltotriose or higher order malto-oligosaccharides (Meissner & Liebl, 1998). A double-displacement mechanism of maltose unit transfer and a structural basis for the maltose specificity of the enzyme have since been inferred from studies of the crystal structure of this protein in complex with maltose (Burke et al., 2000; Roujeinikova et al., 2001a). Although a functional maltodextrin phosphorylase (AgpA) was isolated from T. maritima MSB8 cells (Bibel et al., 1998), the sequence of the gene encoding this protein (TM1168) was found to be frame-shifted in the MSB8 genome (Nelson et al., 1999). A homologous sequence has been identified in Thermotoga sp. RQ2 (Mongodin et al., 2005) (GI:69954028). Further work is needed to determine the biological significance of these apparent interstrain differences. Although synthesis of intracellular α-glucan storage polysaccharides via 4-α-glucanotransferase has not been demonstrated experimentally for T. maritima, other hyperthermophiles have been shown to synthesize and degrade such polymers (e.g., Thermococcus hydrothermalis) (Gruyer et al., 2002).
Questions remain about the regulatory mechanism of starch and maltose utilization genes in T. maritima, as no maltose regulator has yet been characterized in this species. Transcriptional regulators specific for maltose have been identified in many organisms, including LacI/GalR family repressor proteins in various bacteria (Reidl et al., 1989; Nieto et al., 1997; Goda et al., 1998; Andersson & Radstrom, 2002; Le Breton et al., 2005), the activator MalT in E. coli (Cole & Raibaud, 1986), a Crp homolog in Bacteroides thetaiotaomicron, and TrmB in the archaeal species Thermococcus litoralis (Lee et al., 2003) and Pyrococcus furiosus (Lee et al., 2005b). The gene encoding the LacI family protein TM1200 lies downstream of the maltose/β-mannotetraose transporter (Nanavati et al., 2005), and like the transporter, is differentially expressed on lactose as compared to glucose but not during growth on maltose (Nguyen et al., 2004). Clearly, further work beyond expression studies will be needed to determine the mechanism of regulation of maltose utilization genes.
β-Linked oligo- and polysaccharides
Processing of β-1,4-linked glucans (e.g. carboxymethylcellulose, β-1,4 linked degradation products from barley glucan, cellobiose) by T. maritima appears to involve hydrolysis of oligosaccharides by extracellular β-glucosidases, transport of small chains into the cell via Opp family transporter(s) and a phosphorylation and cleavage step mediated by a cellobiose phosphorylase. Cellulase I and cellulase II were first isolated from T. maritima by Bronnenmeier et al. (1995). This work and later work by Liebl et al. (1996) confirmed that cellulase I (Cel12A, TM1524) is an endo-β-1,4-glucanase, while cellulase II (Cel12B, TM1525), located downstream of Cel12A, has a signal peptide and is an exo-β-1,4 glucanase.
Three oligopeptide/dipeptide family ABC transporter substrate binding proteins encoded within the T. maritima genome (TM1223, TM1226, and TM0031) share similar levels of identity (∼60%) to the characterized P. furiosus substrate binding protein CbtA, which binds cellobiose through cellopentaose as well as β-1,3-linked glucan di- and tri-saccharides and sophorose (2-O-β-d-glucopyranosyl-d-glucose) (Koning et al., 2001). All three T. maritima cbtA homologs have been observed to be transcribed at high levels in the presence of carboxymethylcellulose and barley glucan (Chhabra et al., 2003), as well as cellobiose (C.I. Montero and R.M. Kelly, unpublished data), suggesting the involvement of one or more of the encoded proteins in the import of cello-oligosaccharides. In fact, biochemical characterization of these proteins recently showed that TM0031 (BglE) binds cellobiose and β-1,3 linked laminaribose (see ‘Mannans and mannose’) with equally high affinity (Nanavati et al., 2006). On the other hand, TM1226 (ManD) binds cellobiose with lower affinity but also binds β-1,4 mannobiose through β-1,4 mannotetraose and galactosyl β-1,4 mannobiose, while TM1223 (ManE) binds β-1,4 mannobiose (Nanavati et al., 2006). Although this hypothesis has not been proven biochemically, the proximity of TM1223 and TM1226 within the T. maritima genome might indicate that both substrate binding proteins interact with the permease and ATP-binding subunits encoded by TM1219–TM1222 to import degradation products of hydrolyzed complex carbohydrates such as glucomannan (Conners et al., 2005) (see ‘Mannose incorporation’).
After import into T. maritima cells, cellobiose or cello-oligosaccharides must be broken down further for utilization. A homolog to the intracellular 1,4-β-d-glucan glucohydrolase A of T. neapolitana, also known as BglA (McCarthy et al., 2004), has not been identified in the genome sequence of T. maritima MSB8, but was found in MSB8 by Liebl et al. (1994). A transcript for BglA was also detected by Nguyen et al. (2004) using real time PCR. It remains unclear whether this protein is present in all isolates of T. maritima MSB8, but the characterized protein from T. neapolitana is active on cellotetraose, cellotriose, cellobiose and lactose (McCarthy et al., 2004). The cellobiose phosphorylases of T. neapolitana (CbpA) and T. maritima (CepA) have both been characterized (Yernool et al., 2000; Rajashekhara et al., 2002). In the absence of a cellobiohydrolase in these organisms, cellobiose phosphorylase performs phosphorolysis of cellobiose, releasing one glucose-1-phosphate and one glucose molecule per disaccharide. The T. maritima enzyme (corresponding to TM1848) has also been shown to perform synthetic reactions using various monosaccharides (d-glucose, d-mannose, d-xylose, d-glucosamine, 2- and 6-deoxy-d-glucose and methyl-β-d-glucoside) as glucosyl acceptors (Rajashekhara et al., 2002). It has been proposed that cellobiose phosphorylase is part of an ATP-conserving pathway of β-glucan hydrolysis in Thermotoga species which includes BglA in T. neapolitana (Yernool et al., 2000).
β-1,3 glucan and β-1,6 glucan
Laminarin, a β-1,3-linked glucose polymer, is a plant-derived polysaccharide used as a growth substrate by T. maritima and T. neapolitana (Bronnenmeier et al., 1995). Extracellular hydrolysis of laminarin by these species is likely accomplished by the laminarinase Lam16A (TM0024, EC 188.8.131.52), versions of which have been characterized in both species (Bronnenmeier et al., 1995; Zverlov et al., 1997b, 2001; Boraston et al., 2001). Import of laminarin oligosaccharides likely proceeds via a nearby Opp family transporter, whose substrate binding protein (TM0031) is highly transcribed in the presence of laminarin and has been shown to bind laminaribose and cellobiose (Conners et al., 2005; Nanavati et al., 2006). The high identity observed between TM0031, TM1223, TM1226, and P. furiosus CbtA suggests possible LGT followed by divergence of substrate and regulation. GC-rich inverted repeats which may be capable of forming hairpin structures are found downstream of TM1223 and TM0031, perhaps explaining the difference in transcript levels and changes observed for these substrate binding proteins as compared to other transporter subunits (Conners et al., 2005). Intracellular hydrolysis of laminarin disaccharides is likely performed by the T. maritima homolog (TM0025) of the characterized T. neapolitana laminaribase BglB (Zverlov et al., 1997a). Incubation of laminarin with Lam16A and BglB has been shown to accomplish complete hydrolysis of laminarin to glucose. Although not yet verified experimentally, transcription of this operon may be under the control of the XylR family transcriptional regulator encoded by TM0032, which is found with the predicted laminarin transporter and laminarin hydrolases (Conners et al., 2005).
The transcriptional response of T. maritima to barley glucan, a mixed linkage β-1,3(4) glucose polysaccharide, incorporates elements of both the responses to β-1,4 glucans and β-1,3 glucans, including increased transcripts from the main operons responsive to cellobiose and laminarin. These findings are consistent with the presence in T. maritima Lam16A of two family four carbohydrate binding domains, one of which binds laminarin and mixed β-1,3/β-1,4 linkage polysaccharides, and the other of which binds pustulan, curdlan (β-1,3 glucan) and glucans derived from fungal cell walls, enhancing the hydrolysis of these substrates (Zverlov et al., 2001). Pustulan, a β-1,6 linked glucose polysaccharide, also triggers increased transcription of the T. maritima laminarin utilization genes, although binding of this substrate to the substrate binding protein BglE has not yet been tested. A possible pathway for the utilization of laminarin and pustulan by T. maritima is shown in Fig. 2.
Mannans and mannose
Thermotoga species produce a variety of extracellular enzymes to degrade mannan polysaccharides. Hydrolysis of the backbone of β-1,4 linked mannans in the extracellular environment may be accomplished by an endo-1,4-β-mannanase Man5/ManB (TM1227) (Duffaud et al., 1997; Parker et al., 2001), which is active on glucomannan and galactomannan and responds transcriptionally to the presence of both sugars (Chhabra et al., 2002, 2003; Conners et al., 2005). Endomannanase degradation products are likely imported via ABC transporters, followed by further processing within the cells. Mannan oligosaccharides may be hydrolyzed by the intracellular mannanase ManA/Man2 (TM1624) whose properties have been shown to be similar in T. maritima and T. neapolitana (Duffaud et al., 1997; Parker et al., 2001). Hydrolysis of intracellular glucomannan degradation products apparently involves two glucomannanases, Cel5A (TM1751) and Cel5B (TM1752) (Chhabra et al., 2002). The galactose side chains attached to galactomannan are likely cleaved by the intracellular α-1,6-galactosidase GalA/Gal36A (TM1192), first isolated from T. neapolitana (Duffaud et al., 1997) and later characterized in T. maritima (Liebl et al., 1998b). Incidentally, work in T. neapolitana has revealed the presence of an active transport system induced by lactose and galactose (Galperin et al., 1997), and subsequent microarray experiments have shown that an Opp family transporter system (TM1194, TM1196–TM1199) is up-regulated in the presence of lactose when compared to glucose (Nguyen et al., 2004).
A number of candidates for mannan degradation product uptake have been suggested by functional genomics or biochemical characterization of T. maritima transporters. Some components of the Opp family ABC transporter (TM1746–TM1750) upstream of the Cel5A and Cel5B intracellular glucomannanases are transcribed more highly on mannose and β-mannans than other carbohydrates (Chhabra et al., 2003; Conners et al., 2005), although the specificity of the substrate binding protein (TM1746) has not yet been determined. Increased transcription of the substrate binding proteins TM1223 and TM1226 is observed during growth on β-mannans and mannose (Conners et al., 2005), and both corresponding proteins have been shown to bind various β-1,4 manno-oligosaccharides (Nanavati et al., 2006). Biochemical characterization of the Opp family substrate binding protein encoded by TM1204 showed that despite a transcriptional response by the corresponding gene to lactose, the protein did not bind this sugar; rather, it binds maltose and β-mannotetraose, perhaps suggesting a role in the uptake of β-mannan degradation products (Nanavati et al., 2005).
Mixed carbohydrate α-linked mannans may also be imported for utilization by T. maritima cells. The CUT1 ABC transporter encoded by TM1853-1855 is found with an α-mannosidase (TMM, Man38A, TM1851) capable of hydrolyzing α-1,2, α -1,3, α -1,4, and α-1,6 mannobiose (Nakajima et al., 2003). Although the substrate binding protein TM1855 failed to bind α- and α-linked manno-oligosaccharides, there are indications that perhaps it might bind mannan-glucuronic acid oligosaccharides derived from xanthan gum (Nanavati et al., 2006). Similar LacI family sequence motifs are present upstream of the LacI regulator TM1856 and substrate binding protein TM1855. The function of another protein encoded within this locus (TM1852) remains unknown, but a homologous gene (TM1225) is highly expressed in the presence of β-mannans and colocalizes with Man5B. The TM1225 gene product has been crystallized and classified as a hypothetical protein with no apparent signal peptide, although it does display some similarity to glycoside hydrolases of family 32. It is also related (29% id/331 aa) to a protein of unknown function (Unk1) from a Cellvibrio mixtus galactomannan utilization locus (AAS19693). The conserved positioning of homologs to this gene in three separate mannan utilization loci suggests some undetermined function related to mannan utilization. A possible role as an α-mannosidase has been predicted for TM1231 (Man38B), and transcripts of this gene have been observed to be two- to fourfold higher on mannose and glucomannan than for other sugars (C.I. Conners and R.M. Kelly, unpublished data).
After transport and complete hydrolysis of galactomannan has been accomplished by T. maritima, galactose derived from side chains is likely phosphorylated by the putative galactokinase encoded by TM1190 (Liebl et al., 1998b). Cellobiose resulting from glucomannan breakdown is likely acted upon by cellobiose phosphorylase, yielding glucose-1-phosphate and glucose which can be phosphorylated by glucokinase. However, the pathway for incorporation of mannose into central metabolism it is not clear. Without a PTS system, direct phosphorylation of mannose during transport does not occur. While the ATP-dependent glucokinases of some archaeal species (e.g. Aeropyrum pernix, Thermoproteus tenax) also accommodate mannose (Hansen et al., 2002; Dorr et al., 2003), the T. maritima ATP-dependent glucokinase is not active on mannose (Hansen & Schonheit, 2003). Other sugar kinases are found within the genome, including xylulokinase homologs and a predicted rhamnulokinase, leaving open the possibility that one of these may accomplish phosphorylation of mannose. It is also possible that an alternative pathway for the incorporation of mannose into central metabolism may be used by T. maritima.
Extremely low production of H2 relative to carbon dioxide (H2 1 : CO2 19) has been observed during batch growth on mannose, indicating that mannose metabolism differs substantially from glucose (H2 2 : CO2 1), maltose (4 : 3) and cellobiose (1 : 2) metabolism, among others (Woodward et al., 2002). In addition, several uncharacterized proteins with similarity to alcohol dehydrogenases have been observed to be up-regulated during growth on mannose, suggesting that mannose might be converted into one or more alcohols (e.g., mannitol, mannitol-1-phosphate) which could be isomerized into fructose or fructose-6-phosphate before incorporation into central metabolism. The demand of such reactions on the cellular NAD+/NADH pool might reduce the availability of NADH for use by T. maritima hydrogenases, explaining the reduced H2 production during growth on mannose. While the interconversion of mannose to mannitol with NAD+/NADH has been noted in celery plants as a result of enzymatic activity of mannitol : mannose 1-oxidoreductase (Stoop & Pharr, 1992), no characterized bacterial enzyme has been shown to catalyze this reversible reaction.
If the conversion of mannose into mannitol does take place in T. maritima cells as part of a mannose utilization pathway, candidates for possible mannitol dehydrogenases can be suggested by transcriptional data. For example, transcripts of TM0068 are detected at highest levels during growth on mannose and glucomannan (S.B. Conners and R.M. Kelly, unpublished observation). Although TM0068 is believed to play the role of d-mannonate oxidoreductase in the pathway leading from glucuronates to KDG (see next section), it belongs to a family of proteins (COG0246) which also includes NADH and NADPH-dependent d-mannitol-2-dehydrogenases (EC 184.108.40.206, mannitol+NAD+→fructose+NADH) from Lactobacillus and Pseudomonas species (Brunker et al., 1997; Saha, 2004; Liu et al., 2005c) and d-mannitol-1-phosphate 5-dehydrogenases (E.C. 220.127.116.11, Mannitol-1-phosphate+NAD+→d-fructose-6-phosphate+NADH) (Lee & Saier, 1983; Watanabe et al., 2003).
A candidate locus for the metabolism of mannitol consists of a transaldolase-related protein (TM0295) with similarity to fructose-6-phosphate aldolases (PFAM0923) and a fructokinase (TM0296) divergently transcribed from two alcohol dehydrogenase-related proteins (TM0297 and TM0298). Both TM0295 and TM0297 were highly expressed during growth on arabinose, ribose, mannose and xylan (S.B. Conners and R.M. Kelly, unpublished data). A potential pathway for the conversion of mannose to mannitol might involve TM0297, a member of the FabG family of oxidoreductases and dehydrogenases. Surprisingly, its highest identity (50% id/255 aa) is to a domain within a hypothetical protein from Gallus gallus (COG1028). TM0298 is related to a Lactobacillus reuteri ATCC 53608 NADH-dependent mannitol-2-dehydrogenase (E.C. 18.104.22.168, COG1063, mdh, 30% id/330 aa) (Sasaki et al., 2005), and might be used to convert mannitol into fructose. A fructokinase (TM0296) displaying similar levels of homology with a Clostridium homolog (40% id/318 aa) and an Arabidopsis protein (39% id/318 aa) is likely involved in the phosphorylation of fructose to fructose-6-P. Finally, TM0295 is a transaldolase-like fructose-6-phosphate aldolase (pfam0923), which has been crystallized (1VPX). A possible pathway for the incorporation of mannose into central metabolism is proposed in Fig. 3.
Growth of T. maritima on the plant polymer polygalacturonic acid (pectin) has been demonstrated (Kluskens et al., 2003). Two main gene clusters have been implicated in the breakdown and catabolism of pectin, and are likely involved in hydrolysis of the polysaccharide. They include a characterized family 1 extracellular pectinase (PelA, TM0433) (McDonough et al., 2002; Parisot et al., 2002; Kluskens et al., 2003) and a family 28 extracellular exo-poly-α-d-galacturonosidase (22.214.171.124, TM0437) (Parisot et al., 2003). Thermotoga maritima appears likely to utilize ABC transporters to take up pectin degradation products rather than a single protein transporter like ExuT (Rivolta et al., 1998). TM0430 (aguG) and TM0431 (aguF) encode ABC permease subunits showing 56% and 57% similarity, respectively, to the TogMN permeases utilized by Erwinia chrysanthemi 3937 for the uptake of oligogalacturonides (Hugouvieux-Cotte-Pattat & Reverchon, 2001; Hugouvieux-Cotte-Pattat et al., 2001). These genes and TM0432 (aguE) are highly transcribed in the presence of pectin (L.D. Kluskens and R.M. Kelly, unpublished observation), and have been hypothesized to function in the import of pectin degradation products (Nanavati et al., 2006).
Hydrolysis of pectin to individual sugar units is most likely followed by metabolism to glyceraldehyde-3-phosphate via enzymes of the pentose and glucuronate interconversion pathway, encoded in a second T. maritima locus. The composition of the uronic acids catabolic pathway of T. maritima, revealed by its genome sequence (Nelson et al., 1999), is most similar to that of Bacillus subtilis. The T. maritima uronate isomerase (TM0064) (Schwarzenbacher et al., 2003) is found divergently transcribed from other uronic acid utilization genes encoding a putative fructuronate reductase (uxuB, TM0068), d-mannonate hydrolase (uxuA, TM0069), 2-keto-3-deoxygluconate kinase (kdgK, TM0067), and 2-dehydro-3-deoxyphosphogluconate aldolase/4-hydroxy-2-oxoglutarate aldolase (kdgA, TM0066). Like B. subtilis, T. maritima appears to lack an uxaA ortholog and contains only one uxaB/uxuB homolog. This suggests that the same enzymes catabolize galacturonic and glucuronic acid. However, in most other microbial genomes, kdgA and kdgK are not usually found nearby the uxu/uxa genes, and these sets of genes are often regulated by different proteins whose identities vary by species. While B. subtilis uses two separate LacI family proteins (ExuR and KdgR) to regulate the two gene sets (Pujic et al., 1998; Mekjian et al., 1999), E. coli utilizes the FadR repressor ExuR (Portalier et al., 1980). An IclR family protein acts as the pectin repressor KdgR in E. chrysanthemi (Reverchon et al., 1991; Nasser et al., 1992), and is released from its operator upon binding the uronic acid metabolite 2-keto-3-deoxygluconate (KDG) (Reverchon et al., 1991; Nasser et al., 1992). A gene encoding a putative transcriptional regulator of the IclR (isocitrate lyase regulator) family is found within the T maritima uronic acids catabolic locus (TM0065), and the crystal structure of the corresponding protein has been determined (Zhang et al., 2002). Based on this structure, it has been predicted that TM0065 likely binds DNA as a tetramer, perhaps interacting with a target sequence containing two closely spaced palindromic operator sites (Zhang et al., 2002). Although the sugar binding specificity of the T. maritima repressor has not yet been determined, two similar palindromic sequences sites separated by a 7 bp spacer are found upstream of TM0065 (S.B. Conners and R.M. Kelly, unpublished observation).
Despite efforts to grow T. maritima on the β-1,4-linked polysaccharide chitin, which is an effective growth substrate for the hyperthermophilic archaeons T. kodakarensis (Tanaka et al., 2004) and P. furiosus (Gao et al., 2003), growth of T. maritima on this polysaccharide was similar to control cultures lacking the substrate (D.A. Comfort and R.M. Kelly, unpublished data). Unlike these two archaeal species, T. maritima may not be capable of hydrolyzing chitin, consistent with an apparent lack of chitinases encoded in its genome. It remains unclear whether T. maritima is capable of using any N-acetylglucosamine-containing sugars. A putative glucosamine-6-phosphate deaminase (TM0813) and N-acetylglucosamine-6-phosphate deacetylase (TM0814) are located near subunits of a T. maritima sugar transporter (TM0810–TM0812); however, these genes did not respond significantly in the presence of chitin (Conners et al., 2005). Recent work has also shown that the substrate binding protein TM0810 does not bind N-acetylglucosamine, chitin, chitosan or any of a number of other tested monosaccharides, disaccharides or polysaccharides, leaving its true substrate unknown (Nanavati et al., 2006).
Both T. maritima xylanases are divergently transcribed from sets of Opp family ABC transporter subunits. However, this similar orientation does not result from duplication. The organization of the transporter subunits, different domain composition of the xylanases and a phylogeny of this family of transporters in T. maritima suggest that these operons arose from more distantly related sets of proteins (Conners et al., 2005). Expression studies have revealed that transcripts of components of both xylanase-containing divergons are higher in the presence of xylan and xylose than other sugars (Chhabra et al., 2003; Conners et al., 2005). Recent binding studies show that the substrate binding protein encoded by TM0071 (xloE) binds xylobiose and xylotriose (Nanavati et al., 2006); however, the affinity of TM0056 has not yet been confirmed. The confirmed binding affinity of TM0071 is consistent with the presence of a β-xylosidase within this locus (Xue & Shao, 2004), and also with recent findings that the colocated Xyn10B (TM0070) is activated in the presence of alcohols and capable of hydrolyzing xylan to xylosides while also producing alkyl β-xylosides via transglycosylation reactions with alcohols (Jiang et al., 2004).
Glucose transport in T. neapolitana apparently occurs via an ion-coupled system (Galperin et al., 1996) whose activity is apparently constitutive (Galperin et al., 1997). Activity of a periplasmic glucose-binding protein has also been detected in T. maritima (Nanavati et al., 2002), although its protein identity remains unknown. Transcriptional studies have been of limited use in identifying transporters that are transcriptionally regulated in the presence of glucose (Chhabra et al., 2003; Nguyen et al., 2004; Conners et al., 2005), which would be consistent with constitutive expression of these systems. A set of ABC transporter subunits encoded by TM1149–TM1153 are located downstream of a glucose-6-phosphate dehydrogenase (TM1155) and 6-phosphogluconolactonase (TM1154), but the putative substrate binding protein has not been shown to bind glucose or a number of tested carbohydrates (Nanavati et al., 2006).
Phosphorylation of imported glucose occurs via the ATP-dependent glucokinase TM1469, a member of the ROK family of sugar kinases, which is highly specific for glucose and and 2-deoxyglucose (Hansen & Schonheit, 2003). The glucokinase apparently lacks the helix-turn-helix domain found in some related glucokinases [e.g. Streptomyces coelicolor A3(2) (Kwakman & Postma, 1994), Staphylococcus xylosus (Wagner et al., 1995)], which function both as glucokinases and transcriptional regulators. 13C-labeling experiments have determined that the bulk of glucose utilization in T. maritima (∼87%) proceeds mainly via the Embden–Meyerhof–Parnas (EMP) pathway, although a fraction (∼13%) of glucose is processed via the Entner–Doudoroff (ED) pathway (Selig et al., 1997). Certain EMP genes are regulated transcriptionally, including TM0208–TM0209, TM0273, TM0688–TM0689, TM0877 and TM1469; however, a mechanism for the regulation of these genes has not yet been determined. Thermotoga maritima is the only known prokaryote with both an ATP-dependent (TM0209) and a poly-phosphate dependent (TM0289) 6-phosphofructokinase (6-PFK) (Ding et al., 2001). The poly-phosphate dependent PFK uses triphosphate and polyphosphate preferentially to diphosphate, and pyrophosphate inhibits the ATP-PFK, perhaps representing an ATP-conserving mechanism.
A T. maritima rhamnose utilization locus has been predicted by sequence comparison (Kyrpides et al., 2000), and increased transcripts corresponding to genes within this locus are detected during growth in the presence of rhamnose (Conners et al., 2005). The locus is present in MSB8 but not in RQ2 (Mongodin et al., 2005), and contains a set of oligopeptide family transporter subunits divergently transcribed from rhamnose catabolic enzymes. A number of these genes, particularly an Opp-family substrate binding protein (TM1068), are highly transcribed in the presence of rhamnose, although binding of rhamnose or rhamno-oligosaccharides by this protein has not yet been verified. A putative regulatory protein of the DeoR family (TM1069) is also present within the locus, suggesting that it may act as a regulator of rhamnose catabolism for T. maritima. Recent work in Rhizobium leguminosarum bv. trifolii has revealed a rhamnose regulator of the DeoR family (Richardson et al., 2004), although its sequence identity to the T. maritima protein is only 24% over 254 residues. The R. leguminosarum rhaR gene is also oriented upstream of a gene for an ABC binding protein (rhaS), but rhaS encodes a sugar binding protein related to ribose and xylose binding proteins rather than oligopeptide binding proteins. However, the two gene sets may share a second mechanism for transcriptional modulation in the form of a hairpin structure found downstream of the ABC binding protein. In R. leguminosarum, a hairpin structure downstream of rhaS prevents transcription of other transporter subunits under noninducing conditions (Richardson et al., 2004). Similar hairpin structures are also found downstream of a number of Opp family ABC substrate binding proteins in other T. maritima ABC transport operons (Conners et al., 2005).
Xylose, arabinose and ribose
Growth of T. maritima on the simple sugars ribose, xylose and arabinose has been demonstrated (Conners et al., 2005). Unlike the polysaccharides described previously, hydrolysis steps prior to transport of these pentose sugars are unnecessary. Transport of ribose likely occurs via a CUT2 family ABC transporter (TM0955–TM0956, TM0958) which is homologous to RbsBAC ribose transporters from other species. Transcripts of these transporter genes are detected at high levels during growth on ribose, arabinose and xylose (Conners et al., 2005), although the substrate binding protein of the transport system (RbsB, TM0958) has only been shown to bind ribose (Nanavati et al., 2006). Also found within this locus are homologs to genes implicated in ribose utilization in other species, including a homolog (TM0959) of the cytoplasmic ribose binding protein RbsD (Kim et al., 2003) and a putative LacI family regulator (TM0949) which has been shown to cluster phylogenetically with RbsRs from other bacterial species (Fukami-Kobayashi et al., 2003). Future examination of hypothetical proteins within the T. maritima ribose utilization gene cluster may reveal insights into strategies used by other species to process this monosaccharide, as homologs to the hypothetical proteins TM0950 and TM0957 are found with a Lactobacillus johnsonii locus with a similar gene content (Pridmore et al., 2004).
The reason for the observed transcriptional response of the T. maritima ribose transporter subunits to arabinose are unclear, given that the substrate binding protein TM0958 was also tested for binding to both arabinose and xylose but apparently does not bind either (Nanavati et al., 2006). Perhaps interconversion of intracellular ribose to arabinose or xylose via enzymes of the pentose phosphate pathway could explain the observed cross-regulation. At present, the mechanism for arabinose import into T. maritima cells remains unclear. A second sugar-binding protein gene (TM0277) found in an arabinose utilization locus with a family 51 α-l-arabinofuranosidase (TM0281) (Miyazaki, 2005) and l-arabinose isomerase (TM0276) (Lee et al., 2004a, 2005a) contains a frameshift, calling into question its functionality.
Transport of xylose into T. maritima cells is likely accomplished by the CUT2 family ABC transporter encoded by TM0112 (xylF), TM0114 (xlyE), and TM0115 (xylK) (Nanavati et al., 2006). Despite the demonstrated affinity of the substrate binding protein XylE for xylose (Nanavati et al., 2006), this locus is not strongly regulated at the transcriptional level by xylose or xylan (Conners et al., 2005). A putative extracellular xylanase (TM0113) and putatative xylulokinase (TM0116) are also encoded within this locus.
Transcriptional studies of the heat shock response of T. maritima
Little was known about the transcriptional response of T. maritima to temperature stress prior to recent functional genomics studies (Pysz et al., 2004b). Genomic data suggested that T. maritima lacked an ortholog to rpoH/sigB encoding the σ factor σ32/σB, a major positive regulator of the heat shock response in E. coli, B. subtilis and other mesophiles (Grossman et al., 1984; Benson & Haldenwang, 1993). Genes encoding other known regulators of stress response (e.g. rpoS, ctsR) were also lacking. However, the T. maritima genome sequence contained a homolog to the E. coli extracytoplasmic stress σ factor σE and several important heat shock proteins, including HrcA, GroESL, DnaK, DnaJ and GrpE (Nelson et al., 1999). To gain insight into transcriptional mechanisms of heat shock response employed by this bacterium, a targeted microarray was used to examine transcriptional profiles of selected T. maritima genes before and at multiple time points after a temperature shift from 80° to 90°C (Pysz et al., 2004b).
In the absence of an rpoH/sigB ortholog, transcripts levels of genes encoding σA and σE orthologs increased after heat shock. It may be that a heat-inducible vegetative σ factor is beneficial to T. maritima in environments characterized by frequent temperature fluctuations. Homologs to the heat shock regulator HrcA have been detected in more than 50 microbial genomes (Snel et al., 2000), including T. maritima (Nelson et al., 1999). In most species, HrcA acts as a thermosensor and repressor of heat shock chaperones, binding to the controlling inverted repeat of chaperone expression (CIRCE) element until deactivated by temperature stress (Zuber & Schumann, 1994). The T. maritima HrcA protein has been shown to repress transcription of a B. subtilis reporter construct containing the dnaK promoter-operator region (Wiegert et al., 2004), and transcriptional data coupled with sequence analysis strongly suggest that an HrcA-mediated heat shock response is present in T. maritima. Transcripts of T. maritima homologs to CIRCE regulon genes (e.g. hrcA, groESL, dnaK) were detected at much higher levels after heat shock, consistent with the presence of palindromic CIRCE sequences upstream of both hrcA and groEL. We have also observed that smaller temperature increases (e.g., 80–85°C) can increase transcript levels of these heat shock genes (Shockley et al., 2005). The recent determination of the structure of T. maritima HrcA represents the first structural information for this family of proteins (Liu et al., 2005a).
Recent work in B. subtilis and Caulobacter crescentus has demonstrated that HrcA is a target of the GroESL chaperone machinery (Mogk et al., 1997; Susin et al., 2004). A complex strategy for regulating transcript levels of the various components of the heat shock chaperone complex has been described in B. subtilis (Homuth et al., 1999). The T. maritima strategy for differential regulation of chaperone subunits has not yet been determined; however, some differential regulation of transcript levels may be accomplished by a genomic rearrangement which pairs DnaK (TM0373) with a small heat shock protein (TM0374) in a locus distant from the hrcA-grpE-dnaJ gene string, an uncommon arrangement is also seen in Chlorobium tepidum (Eisen et al., 2002).
A major difference between T. maritima and model mesophiles is the apparent lack of regulation of most of its proteases in response to temperature stress. This may be because homologs to σ32 and the repressor CtsR, a major regulator of the mesophilic proteolytic response, are missing in T. maritima. Alternatively, T. maritima may gain a survival advantage from constitutive expression of most proteases. Proteases whose transcripts were detected at higher levels during heat stress included a homolog to the HtrA heat shock serine protease (TM0571), and two Clp ATPase subunit homologs, ClpC-1 (TM0198, 820 aa) and ClpC-2 (TM0873, 314 aa). A frame-shift is present in TM0873, leaving the functionality of the resulting protein in question.
Functional genomics insights into T. maritima behavior in cell communities
Growth in cell communities is the predominant growth mode in natural microbial environments (Costerton et al., 1995). Previous studies demonstrated significant wall growth when T. maritima cells were grown alone in continuous culture (Rinker et al., 1999), suggesting the formation of expolysaccharide (EPS)-associated sessile cell communities. Mixed culture experiments also revealed evidence of mutually beneficial associations between hydrogen-producing T. maritima cells and cells of the methanogen Methanococcus jannaschii (Muralidharan et al., 1997). Such interactions likely indicate adaptive strategies used by Thermotoga species in their varied natural environments. In addition, associations with EPS-producing cell communities may help explain the variety of carbohydrate utilization pathways found in many Thermotoga species. New insights into the ecological interactions of T. maritima have been gained by mimicking conditions that might arise in natural habitats and analyzing the resulting gene expression patterns.
Transcriptional differences between sessile and planktonic T. maritima cells from a continuous culture reactor were measured using a full genome microarray (Pysz et al., 2004a). Sessile cells were found associated within rope-like biofilm structures which formed on reactor walls, polycarbonate filters and nylon mesh. Despite the heterogeneity of the planktonic and sessile cell populations, distinct patterns of differential expression were observed for a number of predicted operons. While maltose, an α-1,4 linked disaccharide, was used as the growth substrate, transcripts of cellobiose phosphorylase and endoglucanases involved in the processing of β-1,4 linked polysaccharides (TM1524–TM1525, Cel12A and Cel12B) were higher in biofilm cells, possibly as a result of the assembly or recycling of EPS.
Consistent with observations from transcriptional studies of mesophilic biofilms (Whiteley et al., 2001; Svensater et al., 2001; Schembri et al., 2003), transcripts from heat shock (e.g., dnaK, smHSP) genes were higher in biofilm cells, while transcripts of a cold shock gene (Phadtare et al., 2003) were lower in biofilm cells. Transcripts relating to iron–sulfur cluster synthesis and repair were detected at higher levels in sessile T. maritima cells, including iron–sulfur cluster chaperones (sufABD-sufS/iscU) (Rangachari et al., 2002; Bertini et al., 2003; Mansy et al., 2004), putative iron (II) transporters (feoAB, ftr1), putative sulfur compound transporter subunits (tauABC), and iron–sulfur cluster proteins (Pysz et al., 2004a). More recent work has clarified the relationship between heat shock chaperones and iron–sulfur cluster chaperones. Wu et al. (2005) used the T. maritima versions of IscU, DnaK and DnaJ used to demonstrate that DnaK stabilizes Fe–S clusters bound by IscU and inhibits the transfer of these clusters to other proteins. Changes in iron–sulfur cluster proteins may relate to altered iron levels within EPS-associated sessile cells. Recent work in other species has also demonstrated the critical role of iron in regulation of biofilm formation and maintenance. Low iron levels trigger biofilm formation in S. aureus (Johnson et al., 2005a). Formation of P. aeruginosa biofilms is inhibited in the presence of iron salts, and degradation of pre-existing biofilms grown in flow-chambers occurs upon switching to iron-rich media (Musk et al., 2005). Although the mechanism by which iron levels regulate biofilm formation in T. maritima remain unclear, homologs to Fur, a ferric uptake regulator, are found in the genome, and transcripts of one homolog (TM0121) were detected at higher levels in biofilm cells than planktonic cells. Small RNAs (sRNAs) have been shown to mediate transcriptional control of iron utilization genes in several other species, such as RyhB in E. coli and Vibrio cholerae (Mey et al., 2005; Masse et al., 2005), and the unrelated Prr1 and Prr2 in P. aeruginosa (Wilderman et al., 2004). In the case of V. cholerae, RyhB has been implicated in iron-dependent biofilm formation processes (Mey et al., 2005), although Pyrr1 and Pyrr2 do not appear to play this role in P. aeruginosa (Banin et al., 2005). Although no obvious sequence homologs to either RyhB or Prr1/Prr2 are present in T. maritima, it remains to be seen whether functional homologs of these sRNAs could be present in the genome.
While single-organism studies can reveal much about the growth physiology or organisms, mixed cultures containing more than one organism can offer additional insights into mechanisms relevant to growth in cell communities. Extremely high cell densities have been achieved during growth of T. maritima cells in the presence of the methanogenic archaeon M. jannaschii, which uses the growth inhibitory H2 produced by T. maritima cells to produce methane (Johnson et al., 2005b). Full genome transcriptional comparisons between T. maritima cells in pure culture and high density coculture with M. jannaschii indicated changes in sugar utilization and transport genes, including glycosyl transferases and genes encoding known glucomannan (TM1752) and α-glucan (TM1834) hydrolases, which correlated with the appearance of extensive EPS in the culture (Fig. 4). Calcofluor staining of pure and mixed cultures indicated the presence of β-1,4-linked glycans, while analysis of biofilm material from the high-density coculture revealed a polysaccharide composed mainly of glucose (∼92%), ribose and mannose. Transcriptional patterns indicated that regulation of EPS formation and maintenance might involve cyclic di-GMP, a second messenger which regulates cellulose synthesis in Gluconacetobacter xylinum (Chang et al., 2001) and S. enterica serovar Typhimurium (Garcia et al., 2004), and biofilm formation in V. cholerae (Tischler & Camilli, 2004), Staphylococcus aureus (Karaolis et al., 2005) and Yersinia pestis (Kirillina et al., 2004), among others. Several GGDEF-domain containing proteins displayed different expression patterns between the pure culture and coculture, including a putative diguanylate cyclase (TM1163) and cyclic-di-GMP phosphodiesterase (TM1184). Subsequent characterization of TM1163 has since confirmed its diguanylate cyclase activity (Ryjenkov et al., 2005). Transcriptional response data led to the identification of a small, unknown ORF (TM0504) colocalized with an ABC peptide transporter lacking a substrate-binding protein, suggesting a possible role in peptide export. Differential expression of an orphan ABC transporter subunit (TM0043) with a putative proteolytic motif (COG2274) suggested a possible mechanism processing of the encoded TM0504 peptide prior to secretion. The addition of a synthesized form of the TM0504 peptide to pure low-density T. maritima cultures triggered formation of EPS within 30 min, although EPS did not form in undosed control cultures. Although peptide-mediated quorum sensing has been shown to operate in numerous Gram-positive bacteria (for a recent review, see Lyon & Novick, 2004), this report is the first indication of its importance in hyperthermophilic habitats. Clearly, there is much still to learn about peptide-based signaling in T. maritima, and future work will undoubtedly offer new insight into the ecological behavior of this and related species.
Formation of EPS-bound cell aggregates in high-density cocultures of T. maritima and M. jannaschii has been shown to be growth-phase dependent: aggregation is most apparent in mid-log phase, and aggregates break up during stationary phase (Johnson et al., 2006). Despite the presence of EPS (as indicated by Calcofluor staining) in both pure T. maritima cultures and cocultures, cell aggregates formed in the coculture but not the pure culture, likely to facilitate heterotroph-methanogen H2 transfer (Muralidharan et al., 1997). Transcriptional analysis revealed that many more genes changed during the transition into stationary phase in the coculture than the pure culture (Johnson et al., 2006). A number of genes encoding carbohydrate utilization enzymes and transporters were up-regulated during stationary phase in the coculture, including members of known β-glucan degradation pathways (e.g., TM1848, cepA; TM1524, cel12A; TM1525, cel12B), although the carbon source in the media was maltose. Presumably, the up-regulation of these genes (many of which were also up-regulated in the pure culture biofilm-planktonic cell comparisons described above) in early stationary phase is connected to aggregate disintegration, as growth-phase dependent biofilm degradation has been observed in other species (Stoodley et al., 2002; Kaplan et al., 2003). The transcriptional sensitivity of β-glucan degradation genes to the presence of cellobiose and cellobiose-like EPS degradation products is likely an adaptation to growth in EPS-associated cell communities.
Biotechnology applications of Thermotoga enzymes
The extreme thermostability of Thermotoga enzymes is attractive for an array of biotechnology applications. A number of potential uses have been described for carbohydrate active enzymes of Thermotoga species, including several characterized glycoside hydrolases. Galactomannan-utilizing enzymes of T. maritima (e.g. mannanase, β-mannosidase, α-galactosidase) show promise for breaking down guar gum used as a fracturing fluid in oil/gas wells (Comfort et al., 2004). The T. maritima maltosyltransferase (TM0767, MmtA) (Meissner & Liebl, 1998) has been used to synthesize malto-oligosaccharide-daidzein glycosides, which greatly increase the water solubility of soy-derived daidzein isoflavone (Li et al., 2004). A highly thermostable β-fructosidase/invertase from T. maritima holds promise for industrial applications which include the production of invert sugar and the hydrolysis of inulin, a fructose polymer (Liebl et al., 1998a). The single-domain xylanase XynB isolated from Thermotoga sp. strain FjSS3-B.1 has high thermostability and has been shown to be active on kraft pulp (Saul et al., 1995). Use of this enzyme or similar enzymes during paper manufacturing processes could potentially reduce the use of chemical methods for bleaching. A recombinant version of T. maritima XynB (TM0070) has shown high alkaline stability during pulp pretreatment processes and reduces the need for chlorine during biobleaching of wheat straw pulp (Jiang et al., 2006).
Characterized sugar isomerases from Thermotoga species have been of particular interest for biotechnology. High-temperature glucose-to-fructose isomerization has been demonstrated using the T. neapolitana and T. maritima (TM1667) xylose (glucose) isomerases (Vieille et al., 1995), suggesting that these enzymes may improve fructose concentrations by allowing the production of high-fructose corn syrup at high temperatures (for a recent review, see Bandlish et al., 2002). High activity mutants of T. neapolitana xylose isomerase (TNXI) have been produced by site-directed mutagenesis to optimize glucose isomerase activity under lower temperature and pH (Sriprapundh et al., 2000, 2003). l-Arabinose isomerases (AIs) have been characterized for both T. neapolitana and T. maritima (Kim et al., 2002; Lee et al., 2004a). These proteins are interesting for their potential to produce the industrially interesting sugar d-tagatose, which has potential as a low-calorie sweetener and is currently produced via a patented chemical synthesis process. Both Thermotoga AI enzymes interconvert d-galactose and d-tagatose, achieving >50% conversion (Kim et al., 2002; Lee et al., 2004a). However, their higher catalytic acitivity on arabinose than galactose and the lack of transcriptional response of the T. neapolitana gene to galactose suggests galactose isomerase activity is secondary to arabinose isomerase activity (Kim et al., 2002). Although Thermotoga AIs and mesophilic arabinose isomerases have lower affinity for the nonphysiologic substrate d-galactose, the Thermotoga enzymes do display greater catalytic efficiency and increased conversion rates due to higher temperature optima (Lee et al., 2004a).
Structural studies of T. maritima enzymes
A substantial fraction of the >150 solved crystal structures available from PDB for T. maritima proteins are the result of a structural genomics effort aimed at the T. maritima proteome (Lesley et al., 2002; DiDonato et al., 2004). This high-throughput pipeline developed by the Joint Center for Structural Genomics at the Scripps Institute has resulted in the production of 469 crystal hits for T. maritima proteins (DiDonato et al., 2004). Priority for structural determination within this group was given to 269 targets with low similarity to known structures. To date, a number of novel folds have been identified using this approach (Levin et al., 2005; Mathews et al., 2005; Rife et al., 2005). A website is available to check the progress of structure determination efforts for all T. maritima proteins: http://www.jcsg.org/scripts/prod/public_targets/pub_target_list.cgi. Publications resulting from the JCSG crystallization pipeline and other investigations since 1999 are listed in Table 4. Global examination of protein properties from T. maritima structures has revealed that 73% have higher contact order than their structurally characterized mesophilic counterparts (Robinson-Rechavi & Godzik, 2005).
Table 4. Publications for Thermotoga maritima proteins characterized structurally by JCSG and others (1999 to January 2006)
Conserved hypothetical protein
PII-like signaling protein, potentially involved in signaling in response to cellular nitrogen status
Independent structural studies have also illustrated the value of such approaches for revealing insights into the biology of T. maritima. Aspartate dehydrogenase (NadX) was identified as a nonhomologous substitution for l-aspartate oxidase (NadC) in the T. maritima NAD biosynthesis pathway through structural and functional studies of TM1643, a previously uncharacterized reading frame situated between the T. maritima homologs of nadA and nadB (Yang et al., 2003). Functional studies of aspartate dehydrogenase revealed that like aspartate oxidase, it catalyzes the oxidation of l-aspartate to iminoaspartate. However, NAD(P) is reduced in contrast to FAD, used as a cofactor by most NadC proteins. Previously uncharacterized homologs to TM1643 are apparent in the genomes of numerous archaea and bacteria. Insight into the function of another large family of proteins was obtained through structural studies of TM0841, a putative fatty-acid binding protein (Schulze-Gahmen et al., 2003). Members of this family, related to B. subtilis DegV, are classified into COG1307 and may play a role in the transport or metabolism of fatty acids.
Since its isolation in 1986, T. maritima has become an important model hyperthermophile. Its genome sequence revealed massive LGT with archaea (Nelson et al., 1999) and comparative studies have uncovered stunning diversity within the Thermotoga genus (Mongodin et al., 2005). Despite the lack of a well-developed genetic system for T. maritima, alternative strategies have been used to predict pathways and gene functions of this organism, resulting in a wealth of new information. In particular, the application of high-throughput genomics methods (e.g. microarrays, high-throughput structural genomics) to T. maritima has provided complementary information to single-enzyme studies. Expression studies in particular have provided clues about the specificities and regulation of carbohydrate transport and utilization pathways. In fact, predictions of ABC transporter substrate binding protein specificites based on transcriptional data (Chhabra et al., 2003; Conners et al., 2005; Nanavati et al., 2005) accurately reflected the verified biochemistry of many of these proteins (Nanavati et al., 2006). The results of these studies and others addressing the behavior of T. maritima within cell communities (Pysz et al., 2004a; Johnson et al., 2005b, 2006) illustrate the potential of functional genomics approaches to offer important clues about T. maritima biology. In addition to providing extensive information about individual protein structures, high-throughput structural genomics efforts have allowed global studies of protein structure, revealing new insights into the extreme thermostability of T. maritima enzymes. The success of these initial efforts will undoubtedly guide further work on T. maritima and other interesting organisms for which genetic systems have not yet been developed.