Transcription initiation in Archaea: facts, factors and future aspects


Jörg Soppa. E-mail; Tel. and Fax (+49) 69 7982 9564.


The basal apparatus for transcription initiation in Archaea is more closely related to the eukaryal than to the bacterial counterpart. The understanding of archaeal transcription initiation has been deepened by recent advances, which include genome sequencing, biochemical approaches and the structure determination of a protein DNA complex. Archaeal promoter elements, transcription factors, RNA polymerase and their interactions are discussed and compared with the eukaryal situation. It is emerging that transcription initiation is not uniform in Archaea. A minimal set of promoter elements and transcription factors is conserved, but the relative importance for transcription initiation can vary. Furthermore, additional basal transcription factors and promoter elements seem to be crucial in subgroups of Archaea. Finally, some aspects of global as well as gene-specific transcriptional regulation are discussed.


Archaea form a third domain of organisms apart from Eukarya and Bacteria (former names for these three groups were Archaebacteria, Eukaryotes and Eubacteria). The evolutionary relationship of the three domains was determined by analysing the phylogenies of duplicated genes, which indicate that, despite their ‘prokaryotic’ character, Archaea are more closely related to Eukarya than to Bacteria. The closer relationship of Archaea and Eukarya was strengthened by the investigation of phylogenetic trees of all genes, for which orthologues exist in all three domains (Brown and Doolittle, 1997). Initially, it was thought that Archaea are confined to peculiar ecological niches with extremes of temperature, pH or salt concentration. Recently, usage of domain-specific molecular probes allowed the detection of Archaea in a wide variety of ‘normal’ habitats, e.g. oceans, freshwater lakes, soils and as endosymbionts. In these places they form up to 30% of the microbial populations (Delong et al., 1994). Furthermore, analysis of a single pond in the Yellowstone National Park, WY, USA, allowed the detection of more than 30 new archaeal species, one of which being the most deeply branching species known (Barns et al., 1996). These studies illustrate that Archaea are widespread, form a non-negligible part of the biomass, and that only a small fraction of archaeal species is known today. Thus, it seems that the importance of Archaea in the microbial world is still widely underestimated. Archaea are divided into the three kingdoms Euryarchaeota and Crenarchaeota, both of which are represented by many species, and Korarchaeota, momentarily represented by a single, as yet uncultured species.

Transcription of DNA into RNA and its regulation are very fundamental biological processes. Therefore, analysis of transcription is important for understanding archaeal biology. Furthermore, it can deepen our understanding of the evolution of the transcription apparatus, and it may even be of heuristic value for studying eukaryal transcription. About 15 years ago evidence began to accumulate indicating that the basal transcription machinery of Archaea is more closely related to that of Eukarya than to the bacterial transcription apparatus. Very recently, progress in different fields has been reported, including the sequence determination of four archaeal genomes (Bult et al., 1996; Klenk et al., 1997; Smith et al., 1997; Kawarabayasi et al., 1998), the structure determination of a ternary complex of two archaeal transcription factors with DNA (Kosa et al., 1997) and results from in vitro transcription systems (e.g. Hausner et al., 1996; Qureshi and Jackson, 1998). This review focuses on the current knowledge about archaeal transcription initiation and, in addition, mentions studies on archaeal transcriptional regulation. Different aspects of archaeal transcription have been reviewed previously (Langer et al., 1995; Thomm, 1996; Reeve et al., 1997; Bell and Jackson, 1998; Soppa et al., 1998). Recent reviews on eukaryal transcription initiation are given by Hampsey, Kadonaga, and Ohkuma (Hampsey, 1998; Kadonaga, 1998; Ohkuma, 1998).

Archaeal promoter elements

About 10 years ago, it became evident that archaeal promoter elements are different from the bacterial paradigm, but instead share some similarities with eukaryal RNA polymerase II promoters. An elegant linker scanning mutagenesis revealed that two regions are important for transcription initiation in vitro: one around 30 nucleotides upstream of the transcriptional start site and the other around the start site itself (Reiter et al., 1990). They are homologous to the eukaryal TATA box and initiator element (and these names should therefore replace the initial nominations box A and box B or, respectively, distal promoter element and proximal promoter element). A variety of studies have underscored that the TATA box is the major basal promoter element throughout the Archaea, and that it is important for transcription initiation at stable RNA (rRNA and tRNA) genes as well as at protein-encoding genes. Single nucleotide changes can result in drastic reduction in transcript levels both in vitro and in vivo. Three studies were performed that aimed at a systematic investigation of the sequence requirements of the archaeal TATA box. In the first study, all single nucleotide changes at seven positions of a TATA box of an rRNA promoter were constructed, and the 21 mutants were compared with the wild-type sequence in an in vitro transcription system from Sulfolobus shibatae, a member of the Crenarchaeota (Hain et al., 1992). The second study quantified the consequences of all single nucleotide changes at 11 positions around the TATA box of a tRNA gene on the transcript levels in vivo using Haloferax volcanii, a member of the Euryarchaeota (Palmer and Daniels, 1995). In the third study 14 nucleotides around the TATA box of a protein-encoding gene of H. volcanii were randomized, thus generating a random library of about 108 entries, and functional TATA boxes in vivo were selected (Danner and Soppa, 1996). The different approaches revealed that the TATA box does not have a strict sequence requirement, but that at several positions 2 or even 3 nucleotides are compatible with high promoter activity, whereas other nucleotides at these positions severely reduce transcription initiation. The TATA box is centred around −26/−27, but there is some flexibility in the spacing between the TATA box and the transcriptional start site, i.e. a divergence from the ideal distance by ±1 or 2 nucleotides is compatible with faithful start site selection.

In a very recent approach, all instances of mapped 5′ ends of archaeal transcripts were collected and used for a statistical analysis to identify non-random sequence elements in archaeal promoters (Soppa, 1999). Three elements were detected that are shared by all archaeal groups: (i) an initiator element (INR) around the transcription initiation site; (ii) the TATA box centred around −26/−27; and (iii) an element comprising two adenines upstream of the TATA box (−34 AA −33), which is designated ‘transcription factor B recognition element’ (BRE) for reasons described below. In addition, it should be noted that no indication for a ‘downstream promoter element’ (DPE) in archaeal promoters could be detected. In Eukarya, this element is situated around +30 in some eukaryal promoters (Hampsey, 1998) and is thought to be detected by a TATA box-binding protein (TBP) associated factor (TAF). Figure 1A gives an overview of archaeal in comparison to bacterial and eukaryal promoter elements.

Figure 1.

. Comparison of promoter elements and transcription factors. A. Comparison of promoter elements from Archaea, Eukarya and Bacteria. The promoter elements are shown schematically aligned to the scale shown on top. The numbering reefers to the transcription start site (+1). BRE, TFB recognition element; TATA, TATA box; INR, initiator element; DPE, downstream promoter element; ORF, open reading frame. B. Comparison of archaeal and eukaryal TBPs. The domain structure is shown schematically. The similarity values are the fractions of identical residues. Average values are shown. The C-terminal acidic domain is present only in a subgroup of archaeal TBPs (see text). At the right side of the figure positive values correspond to an excess of acidic residues (E + D), negative values correspond to an excess of basic residues (K + R). aa, amino acid residues; E, glutamic acid; D, aspartic acid; K, lysine; R, arginine. C. Comparison of archaeal TFBs and eukaryal TFIIBs. The domain structure is shown schematically. The similarity values are the fractions of identical residues. Average values are shown. aa, amino acids residues.

Surprisingly, the statistical analysis revealed that the consensus sequences of TATA boxes, BREs and INR elements differ for subgroups of Archaea, i.e. halophilic Archaea, methanogenic Archaea (both Euryarchaeota), and Crenarchaeaota (mostly Sulfolobales). This indicates that the molecular details of, for example, TBP DNA interactions are variable, although the TBPs are conserved throughout the Archaea and the overall binding to DNA will be the same (see below). Furthermore, the variability in BREs, INRs and additional sequence elements obviously led to the prediction that the importance and even the number of basal transcription factors will not be the same throughout the Archaea (Soppa, 1999).

The very recent discovery of sequence-specific DNA recognition by the eukaryal transcription factor IIB (TFIIB) and the archaeal homologue TFB was a surprise. It had not been anticipated from the structures of ternary complexes, which revealed TFIIB/TFB DNA contacts only to phosphates (see below). Until now, three studies support that a BRE upstream of the TATA box exists and that it is important for transcription initiation: (i) Lagrange et al. (1998) used a binding site selection assay to identify preferred binding of full-length human TFIIB to specific fragments out of a random library. It was found that eight positions directly upstream of the TATA box are important for TFIIB binding, and the consensus sequence −38 SSRCGCC −32 was derived (S = C or G, R = A or G). Mutations at position −34 and −37 were used to verify that this sequence element is important both for TFIIB binding and for in vitro transcription. The results were similar for TFIIB and TFIIBc lacking the N-terminal domain. TFIIB binds this BRE in the ternary complex as well as in the binary complex lacking TBP, and thus human TFIIB is a bona fide DNA-binding protein. Cross-linking of TFIIB to three positions upstream of the TATA box was possible in the binary and the ternary complex. In contrast, cross-linking to two positions downstream of the TATA box occurred only in the ternary complex, indicating that TBP is required for DNA bending and that the BRE upstream of the TATA box is more important for TFIIB binding than TFIIB–DNA contacts downstream of the TATA box. In addition, it was shown that a helix–turn–helix motif of human TFIIB is important for binding. (ii) Qureshi and Jackson (1998) analysed a Sulfolobus phage promoter that had turned out to be very strong in an in vitro transcription system. They showed that the region upstream of the TATA box is important for promoter strength and can raise the promoter strength of the rRNA promoter. Footprinting analyses revealed that TBP and TFBc (lacking the N-terminal domain) alone have low DNA-binding affinities, but that in the ternary complex TFBc contacts four positions in the region upstream and two positions in the region downstream of the TATA box. Analysis of single point mutants showed that −33 A, 3 nucleotides upstream of the TATA box, and, to a lesser extent, −36 G, 6 nucleotides upstream of the TATA box, are important both for ternary complex formation and for in vitro transcription efficiency. In contrast, mutations at both positions directly upstream of the TATA box did not influence either activity. Binding site selection assays showed that TBP alone did not select specific sequences out of a library randomized on both sides of the TATA box, but that, in contrast, in the ternary complex TFBc did select a non-random subset of the library. From 28 clones analysed, 22 had the anticipated direction of transcription in the in vitro transcription system, whereas in six cases the opposite direction of transcription was found. This is a strong indication that TFBc can influence the orientation of the ternary complex depending on the sequences bordering the TATA box (see the discussion on ternary complex orientation below). Of the 22 clones with the ‘native’ direction of transcription, 18 were found to have an adenosine three nucleotides upstream of the TATA box (−33 A). Other positions where a preferred selection can be assumed were a guanosine six positions upstream of the TATA box (15 cases) and a guanosine six positions downstream of the TATA box (15 cases). (iii) As already mentioned, analysis of nucleotide frequencies in natural archaeal promoters allowed the identification of a promoter element upstream of the TATA box (−34 AA −33; Soppa, 1999). This study nicely complements the two in vitro approaches as it indicates that a BRE is important for transcription initiation in the context of the whole transcription initiation machinery in vivo, and that it operates in diverse archaeal groups. In Eukarya, only a minority of promoters contain sequences with high similarity to the BRE consensus, and thus the BRE could not be detected by sequence comparison. Thus, it seems that in Archaea base-specific DNA binding by TFB is a more regular event during transcription initiation than in Eukarya.

The TATA box-binding protein

The TBP is the most general transcription factor in Eukarya. It is involved in transcription initiation with all three RNA polymerases, interacting with different factors specific for the respective polymerases. At RNA polymerase II promoters, TBP binding to the TATA box is an initial step in transcription initiation. In vitro, TBP alone is sufficient for transcription initiation with TFIIB and polymerase; in vivo, it is complexed with other proteins. A long-known complex is TFIID, a complex of TBP and ‘TBP-associated factors’ (TAFs), and recently TBP binding to another complex, the SAGA complex, has been described (Grant et al., 1997). The structure of TBP has been solved from several eukaryal and one archaeal species, alone or in binary complexes with DNA or in ternary complexes with DNA and other transcription factors (TFIIB or TFIIA). It is a ‘saddle-shaped’ protein with a twofold symmetry, which binds to eight basepairs of DNA. Phenylalanines are inserted between the first and the last two basepairs, and the central six basepairs become highly distorted upon binding. The DNA conformation with a higher similarity to A-DNA than to B-DNA was unprecedented and is now called the TATA form of DNA (Guzikevich-Guerstein and Shakked, 1996).

About 4 years ago it was recognized that a homologue to the eukaryal TBP exists in the Archaea. Today, 13 archaeal TBP sequences of diverse archaeal species are publicly available, showing that TBP is a general factor in archaeal transcription initiation. The similarity between different archaeal TBPs is between 40% to 55% (the similarity values given throughout this article are the fractions of identical amino acids), which is somewhat higher than the similarity between archaeal and eukaryal TBPs (20% to 40%). The TBPs of higher Eukarya are highly conserved, but between higher and lower Eukarya the degree of conservation is about the same as between Archaea and Eukarya (e.g. 27% between Homo sapiens and Plasmodium falciparum). The high similarity alone shows that archaeal and eukaryal TBPs are homologous. Furthermore, the majority of amino acids that were shown to be involved in DNA contacts in the binary complex of TATA box and the Arabidopsis thaliana TBP are conserved in all 12 archaeal TBPs (one exception of a haloarchaeal TBP is discussed below), and the amino acids in contact with DNA were nearly identical for the eukaryal complex and the Pyrococcus complex (Nikolov et al., 1995; Kosa et al., 1997). This is notably true for the four phenylalanines that intercalate between the first two and the last two basepairs of the TATA box as well as for 11 out of the 13 base-specific contacts (in the remaining two cases single species carry conservative changes). This implies that the overall binding to the minor groove of eight basepairs of DNA and changing the DNA structure of the central six basepairs, which has been verified for the Pyrococcus TBP, will be the same for all archaeal TBPs. In fact, it has been shown that the degree of conservation is high enough to allow functional replacement of the archaeal factor by human and yeast TBP in an archaeal in vitro transcription system (Wettach et al., 1995). However, it will be interesting to study the ‘fine-tuning’ of binding and thereby to understand the adaptations of binding to the different TATA box sequences of various archaeal groups (see above), as well as the adaptations to the extremes in temperature and osmolarity under which some archaeal TBPs operate.

Comparison of archaeal and eukaryal TBP sequences shows that they form two monophyletic groups, but, in addition, it reveals that they are distinguished by three main features (see Fig. 1B). (i) Archaeal TBPs have an excess of acidic amino acids, whereas eukaryal TBPs have an excess of basic amino acids. (ii) Eukaryal TBPs consists of three different domains. The N-terminal domain, which is neither conserved in length nor in sequence within the Eukarya, is totally missing in archaeal TBPs. (iii) Archaeal TBPs and the C-terminal part of the eukaryal TBPs consist of two repeats of about 90 amino acids, which reflect that the ancient tbp gene arose by a gene duplication from a precursor gene. The similarity of the first to the second repeat is much higher in archaeal TBPs (36% to 53% of identical amino acids, on average 42%) as in eukaryal TBPs (22% to 26%). This indicates that archaeal TBPs are more symmetrical than eukaryal TBPs, which could be verified by the structure of the Pyrococcus TBP. Whereas the ‘symmetrical amino acids’ in eukaryal TBPs are confined to the DNA-binding surface, also the opposite protein surface, involved in protein–protein interactions, is highly symmetrical in archaeal TBPs. This can be rationalized if it is taken into account that Archaea have only one RNA polymerase instead of the three carried by Eukarya, and that Archaea have a smaller number of transcription factors with which TBP has to interact (see below). It has been shown for eukaryal TBP that it binds in both orientations to TATA box containing DNA fragments, and that therefore the orientation of TBP binding within the preinitiation complex has to be brought about by other factors, e.g. TFIIB (Cox et al., 1997). For the archaeal TBPs of higher symmetry this is even more likely to be true.

The similarities of the first repeats of archaeal TBPs are on average only about 6% higher to the first than to the second repeats of eukaryal TBPs. This can be taken as an indication that the two halves had not evolved far at the time the common ancestor of the recent Archaea and Eukarya split into the two lineages, and that the split occurred soon after the gene duplication. At that time TBP could be envisaged to have bound to a DNA element of perfect dyad symmetry.

The TBPs of three groups of Archaea, i.e. Pyrococcus, Thermococcus and Sulfolobus, contain a short C-terminal extension that is not present in the remaining five sequences from halophiles, methanogenes and Archaeoglobus. The extension is very acidic and comprises a run of 5–6 glutamic acid residues (sometimes interrupted by a single glycine or leucine residue). It seems to be safe to assume that this acidic C-terminus is important for function, either for binding another transcription factor or for a covalent modification influencing activity. Whatever it turns out to be, it is an example of a specialized function — present only in a subgroup of Archaea — of a ubiquitous and highly conserved basal transcription factor (see below).

In each of the four sequenced archaeal genomes, one tbp gene can be found, showing that in general a single TBP mediates transcription initiation at all classes of archaeal genes. However, haloarchaea constitute an exception: they have been reported to contain several tbp as well as tfb genes (see below, Reeve et al., 1997). The sequence of a large plasmid of H. salinarum alone contains four tbp genes (Ng et al., 1998), and as plasmid-free strains are viable, at least one additional chromosomal copy must exist. One of the plasmid copies has apparently been inactivated by homologous recombination between the two repeats, as it encodes a protein of only half the size. The N-terminal portion has a higher similarity to the first archaeal repeats, the C-terminal to the second. A second plasmid encoded TBP is very unusual in several respects: (i) the similarity to the remaining two haloarchaeal TBPs is below 40%, whereas these two are nearly 80% identical; (ii) the first and the second repeat are only 27% identical, whereas the lowest value of all other archaeal TBPs is 36%; (iii) a number of amino acids that are conserved in all other archaeal TBPs, and some of them also in all eukaryal TBPs, are different. This is notably the case for two of the four tryptophanes involved in intercalating between TATA box basepairs and three out of six conserved prolines. Thus, by several criteria this is not a bona fide archaeal TBP. One explanation is that this tbp is a pseudogene that is collecting random mutations. However, as the mutations do not seem to be random (e.g. both phenylalanines are replaced by tyrosines), it seems possible that this unusual TBP-like protein has evolved to be a global regulator of a subset of haloarchaeal genes and binds to a DNA element deviated from a normal TATA box. This would be somewhat analogous to the situation in higher Eukarya, where ‘TBP related factors’ (TRFs) have been described. TBP and TRF of Drosophila are 56% identical, but TRF has been shown to be involved in development and to bind only to a very small number of genes in vivo compared with TBP (Hansen et al., 1997).

Apart from the inactivated and the unusual TBP, H. salinarum has at least three ‘normal’ TBPs. This could be explained by the theory of Dennis and Shimmin (1997), who have postulated that haloarchaea have adapted to the potential rapid osmolarity changes in the environment by duplication of essential genes and adaptation of the redundantly coded gene products to different salt concentrations. Thus, it will be interesting to study whether the TATA box affinities of the haloarchaeal TBP variants are differentially influenced by the salt concentration.

Transcription factor TFB

After TBP, the next factor to bind in the eukaryal preinitiation complex at polymerase II promoters is TFIIB, before additional transcription factors and the polymerase are recruited. It consists of three domains, an N-terminal domain (TFIIBN) of about 100–120 amino acids, and a C-terminal region (TFIIBC), comprising two repeats of about 90 amino acids.

Eight archaeal TFB sequences (called TFB rather than TFIIB, because only one polymerase is present) and one partial sequence have been published until now, a number which is only slightly lower than the number of known eukaryal TFIIB sequences. The degree of conservation of archaeal TFBs is considerably higher than that of archaeal TBPs (50–60%, and thus on average more than 10% higher than TBP similarities). In contrast, the similarities between archaeal and eukaryal TFBs are between 20% and 30%, on average slightly lower than the values of archaeal and eukaryal TBPs.

Archaeal TFBs share with eukaryal TFIIBs the architecture of three domains, including two 90-amino-acid repeats at the C-terminus (Fig. 1C). Like the archaeal TBPs, the archaeal TFBs also have a higher degree of internal symmetry (25–37%) than the eukaryal TFIIBs (11–25%), probably again reflecting that the eukaryal proteins have to interact with a larger number of different basal transcription factors and gene-specific regulators. The first repeats of the archaeal TFBs are on average about 8% more similar to the first than to the second eukaryal repeats, indicating that also the TFBs had not evolved far at the time the archaeal and eukaryal lineages separated.

The N-terminal domain is more highly conserved within the Archaea (50–60%) than within the Eukarya (30–40%) or between Archaea and Eukarya (20–30%). The structure of the N-terminal domain has been solved for the Pyrococcus TFB (Zhu et al., 1996), representing a first case in which results on archaeal transcription factors preceded results on eukaryal proteins. The domain forms a zinc ribbon, and the metal-binding motif (CXXCX15–17CXXC) is conserved throughout Archaea and Eukarya, therefore the structure can be assumed to be similar. This motif indicates that the N-terminal domain could be involved in DNA binding. As already mentioned, it could indeed be shown that human TFIIB is a bona fide DNA-binding protein recognizing a promoter element directly upstream of the TATA box (Lagrange et al., 1998). However, similar results were also obtained with TFIIBC, and Sulfolobus TFBC is capable of sequence-specific DNA binding as well (Qureshi and Jackson, 1998). Therefore, the N-terminal domain probably only modulates binding, and a more systematic comparison of TFIIB/TFB and TFIIBC/TFBC and probably the application of additional methods are necessary to clarify the role of the N-terminal domain. It could well be that the presence of RNA polymerase is required for this purpose, as only TFIIB, in contrast to TFIIBC, is active in in vitro transcription (Malik et al., 1993).

The structures of ternary complexes of TFIIBC/TFBC, TBP and DNA from a mixed eukaryal system and from Pyrococcus have been determined and revealed no base-specific contacts of TFIIBC/TFBC (Nikolov et al., 1995; Kosa et al., 1997) but only phosphate-mediated DNA contacts on both sides of the TATA box. However, in both cases the N-terminal domain of the protein was omitted, and the DNA fragments were rather short, lacking the BRE, which possibly explains the failure to detect base-specific contacts. The structures did show that TFBC/TFIIBC binds unsymmetrically to TBP, in both cases to the C-terminal repeat. Most of the amino acids involved in Pyrococcus TFB–TBP interaction are conserved throughout the Archaea, showing that TFB binding to the C-terminal TBP repeat is highly conserved. The asymmetry of TFIIB/TFB binding to TBP and base-specific TFIIB/TFB DNA binding can explain the oriented binding of the symmetric TBP within the preinitiation complex. In the Pyrococcus ternary complex the orientation of TFB–TBP was reversed in comparison with various eukaryal complexes (TFIIB–TBP–DNA, TFIIA–TBP–DNA). This was surprising because the distance between the TATA box and the transcriptional start site is conserved in both domains. This opposite way of binding was explained with the absence of several basal transcription factors in Archaea (see below), which would require different TFB polymerase interactions in Archaea. An alternative explanation would be that an artificial orientation was crystallized because of the lack of the N-terminal domain and the TFB binding promoter element, and that the in vivo orientation of archaeal and eukaryal ternary complexes is the same. In my opinion the question of orientation of eukaryal versus archaeal ternary complexes is still open, because the available data are controversial: (i) As described above, the structural data indicate opposite orientations, but especially the lack of BREs in the crystals compromise this evidence, and with the Sulfolobus system it was shown that sequences flanking the TATA box can reverse the direction of transcription (Qureshi and Jackson, 1998). (ii) Archaeal TFB extends the TBP footprint in the direction upstream of the TATA box (Hausner et al., 1996), whereas eukaryal TFIIBC extends the TBP footprint downstream of the TATA box (Malik et al., 1993), which could also be regarded as evidence for the opposite orientation of the complexes. But full-length TFIIB did not give a footprint, and footprint analysis cannot testify for sequence-specific binding. Many data show that TFIIB/TFB contacts DNA on both sides of the TATA box, and occlusion of DNA from DNase cleavage could be different in spite of identical orientations of ternary complexes. (iii) Binding site selection experiments show that both for eukaryal TFIIB and for archaeal TFB, the region upstream of the TATA box is much more important for sequence-specific binding than the downstream region, which could indicate identical ternary complex orientation (Lagrange et al., 1998; Qureshi and Jackson, 1998). Clearly, further studies are needed, and the Sulfolobus data underscore that it is important to clarify ternary complex orientation using a system that allows the determination of the direction of transcription in parallel. However this question is resolved, the promoter analysis (see above) indicates that the orientation is conserved throughout the Archaea.

In three of the four archaeal genomes a single tfb gene is present, indicating that one TFB mediates transcription from all classes of genes. Some exceptions from this generalization emerge. Again, one exception are halophilic Archaea, which are reported to contain several tfb genes (Reeve et al., 1997). Another exception is P. horikoshii, which contains two tfb genes. However, one of them, TFB-2, deviates considerably from the standard TFB of Archaea or Eukarya. The protein consists only of the two repeats. The N-terminal domain, including the zinc ribbon motif, is lacking totally. Again it can be speculated that this TFB could be a specialized version involved in transcription initiation of only a small subset of genes. Furthermore, the single TFB from M. jannaschii is exceptional in containing an intein of 334 amino acids in the N-terminal domain.

Further basal transcription factors

It has been shown that in vitro transcription with archaeal TBP, TFB and polymerase alone is possible. However, this is also true for eukaryal transcription and cannot be taken as an argument against the participation of further factors in vivo. Qureshi et al. (1997) showed that neither TBP- nor TFB-depleted unfractionated cell extract of S. shibatae is active in in vitro transcription, but that the activity can be fully reconstituted with heterologously produced pure TBP or TFB respectively. They could also reconstitute a polymerase-depleted extract with a purified polymerase preparation comprising 10 subunits. These elegant experiments indicate that no stable complexes of TBP, TFB or polymerase with other basal transcription factors exist in Sulfolobus, specifically no Eukarya-like TFIID. In line with these biochemical experiments, in the four sequenced archaeal genomes no homologues to eukaryal TAFs could be detected. Similarly, genes with coding capacity for archaeal homologues of TFIIF, TFIIA and TFIIH are not obvious.

All four archaeal genomes contain a single gene coding for a protein with similarity to the α-subunit of TFIIE. Eukaryal TFIIE is a heterotetramer of a α2β2 composition. It recruits TFIIH to the preinitiation complex and is important for polymerase phosphorylating activity of TFIIH (see below). Both functions do not seem to be required in Archaea, therefore either the archaeal TFIIEα-homologues fulfil an additional TFIIE function that still has to be described more clearly in Eukarya, or it functions differently, both of which seem possible. On the one hand, eukaryal TFIIE is a less well-characterized transcription factor, and there is evidence that not all functions are known yet. Very recently, it could be shown that TFIIEα alone can influence TBP binding to DNA, and that the zinc ribbon motif (see below) is crucial for that activity (Yokomori et al., 1998). On the other hand, the archaeal ‘TFEα’ is only 200 amino acids long, corresponding to the N-terminal half of eukaryal TFIIα. The degree of similarity is not high within the archaeal sequences (around 30%) and is even lower within the eukaryal sequences or between archaeal and eukaryal sequences. The three known eukaryal TFIIα contain a zinc ribbon motif (CXXCX21CXXC). One of the CXXC motifs is strictly conserved in the four archaeal sequences, the other as well as the distance is variable, resulting in the consensus sequence CX2–4(C/H)X14–15CXXC. Thus, the archaeal ‘TFEα’ could well be a third archaeal basal transcription factor with DNA-binding capacity, which, however, is more species-specific than TBP and TFB.

Another open reading frame (ORF) conserved in all four archaeal genome sequences is annotated as coding for the basal transcription elongation factor TFIIS, and this was in fact the first archaeal gene with similarities to eukaryal transcription factors to be discovered. This discovery was made 5 years ago. But several lines of evidence indicate that this is probably a mistake, and that the archaeal gene codes for a RNA polymerase subunit: (i) the six archaeal genes known code for proteins of 103–136 amino acids in length. This is similar to a 14 kDa subunit of polymerase II (three examples) and a 12 kDa subunit of polymerase III (two examples), whereas it is much shorter than the four eukaryal TFIIS sequences, which are around 300 amino acids. (ii) The archaeal gene products are on average more similar to the polymerase subunits than to TFIIS. (iii) The archaeal proteins and the polymerase subunits share two zinc ribbon binding motifs. The first one (CXXCX12–18CXXC) is totally missing in the eukaryal TFIIS sequences.

Therefore, until now only three basal transcription factors seem to be conserved throughout the Archaea, and biochemical or genetic evidence for the function of the third factor is still missing.

The genomes of A. fulgidus and P. horikoshii, but not the other two genomes, contain a gene with similarity to a TBP-binding protein, which has only recently been detected in Eukarya (Kanemaki et al., 1997). It is called TIP49 (TBP interacting protein of 49 kDa), and the human protein was purified because of its affinity to TBP. Two archaeal and five eukaryal sequences are available at the moment, as well as a fragment from a third archaeal sequence from S. solfataricus. The proteins are all about 450 amino acids long, and the average similarity of the archaeal and the eukaryal TIP49s is 43%. In fact, the similarities of the two archaeal proteins to the human protein (46% and 47%) are higher than between the human protein and the yeast protein (40%). The function of this emerging protein family is not clear. All sequences contain a ATP-/GTP-binding P-loop motif, and a first indication for a possible function is given by the limited similarity to the bacterial RuvB protein, which acts as a helicase. The TIP49s seem to be the first example of archaeal basal transcription factors that are not present in all species but are confined to subgroups of Archaea. Promoter analysis (see above) predicts that additional examples will follow, which, however, are not obvious from the genome sequences.

Archaeal RNA polymerases

RNA polymerases were the first components of the archaeal basal transcriptions apparatus to be investigated, and in fact the discovery that archaeal RNA polymerases are more similar to eukaryal than to bacterial enzymes triggered subsequent work on the other components. Throughout many years, major contributions to this field have been made by the group of Zillig, who started at the end of the 1970s to characterize archaeal polymerases soon after it was discovered that Archaea form a distinct domain. No new findings have been reported recently, therefore the reader is referred to earlier reviews for a detailed picture (Zillig et al., 1992; Langer et al., 1995;). In short, Archaea contain a single DNA-dependent RNA polymerase, which is a multicomponent enzyme of 10–14 subunits. Most of the subunits have been shown to be homologous to subunits of eukaryal polymerases, which are also multicomponent enzymes. This is in contrast to bacterial enzymes, which comprise only three different subunits and have the composition α2ββ′. Bacterial polymerases in a complex with one of several possible σ factors bind to DNA, where the σ factor mediates the recognition of promoter elements. In contrast, in Archaea and Eukarya TBP and TFB bind to the DNA first, and polymerase is recruited later, and does not form a stable complex with one of these factors in solution. In Eukarya, the largest subunit contains a ‘C-terminal domain’ (CTD), which comprises a serine- and tyrosine-rich repeat (more than 20 repeats in the human enzyme) and can be phosphorylated. The underphosphorylated enzyme is recruited preferentially into the preinitiation complex, and phosphorylation by TFIIH and other kinases is a prerequisite for the transition from the initiation to the elongation phase of transcription. This CTD is missing from the archaeal enzyme, therefore a phosphorylation switch between initiation and elongation does not seem to operate in Archaea. The absence of the CTD is another argument for the absence of an archaeal TFIIH homologue besides the inability to detect it in the genome sequences.

Global regulatory mechanisms

Regulatory factors influencing transcription efficiency have to interact, directly or indirectly, with the components of the basal transcription apparatus and modulate their activity. Although the work on the basal archaeal transcription apparatus has progressed well, characterization of regulatory processes in Archaea is still in its infancy. Nevertheless, a wide variety of approaches is emerging, and a summary of the work on archaeal transcription regulation — global or gene specific — is beyond the scope of this review. However, a few points should be made to highlight the focus of the current research.

A few different elements possibly involved in global regulatory processes have been detected, although none of them has been studied in detail. As already mentioned, duplication of a gene for a basal transcription factor and subsequent specialization of one of the proteins for a subset of genes is one possibility that could be used by H. salinarum (TBP) and P. horikishii (TFB).

The influence of DNA topology on transcription efficiency could be a second mode of global regulation. More than 20 Z-DNA-containing genomic fragments have been isolated from H. salinarum making use of an anti-Z-DNA antibody (Kim and DasSarma, 1996). If these were located in promoter regions, conditional Z-DNA formation could influence transcription. For one gene, the bacterioopsin gene, it has been shown that transcription is influenced by the superhelicity of the DNA, and that a non-B-DNA conformation, possibly Z-DNA, in the promoter is involved in regulation (Yang et al., 1996).

A third mechanism could be a differential packaging of the chromosome. For H. salinarum, it has been shown that the chromosome is protein free in the early exponential growth phase, and that it is packaged into regular nucleosome-like structures in stationary phase (Takayanagi et al., 1992). It is reasonable to postulate that this growth phase-dependent DNA packaging will have a global influence on transcription. The hyperthermophilic archaeon Methanothermus fervidus produces histone-like proteins throughout the growth phase, and archaeal histone-like proteins have been most thoroughly studied in this organism (Reeve et al., 1997). However, two variants exist, HmfA and HmfB, for which differences in DNA binding and compaction has been shown. During exponential growth phase, predominantly HmfA is produced, whereas HmfB production equals HmfA production during stationary phase, again a possible mechanism for global transcriptional regulation (Grayling et al., 1996).

Gene-specific transcriptional regulation

A variety of systems to study gene-specific transcription regulation in different Archaea have been established, which will deepen our understanding in the near future. It is emerging that there will be no common ‘archaeal’ mechanism of gene regulation. Three examples should testify for this prediction. (i) The protein GvpE has been characterized as an activator of gas vesicle gene transcription in H. salinarum. It contains a leucine zipper motif, which was shown to be important for function and probably is necessary for protein–protein interaction (Krüger et al., 1998). Thus, it is a ‘Eukarya-like’ activator of transcription. (ii) The first archaeal gene-specific regulator to be described was the repressor T6 of the haloarchaeal phage ΦH. It contains a helix–turn–helix DNA-binding domain, and a protein dimer interacts with a DNA motif of dyad symmetry (Ken and Hackett, 1991), and thus is a protein that follows a typical ‘bacterial’ paradigm. (iii) The ArcR protein is involved in regulation of the genes for arginine fermentation in H. salinarum. ArcR has similarities to repressors containing helix–turn–helix motifs, but ArcR is shorter by 100 amino acids and is lacking this DNA-binding domain (Ruepp and Soppa, 1996). Thus, this ‘bacteria-like’ protein has a different mode of action from its homologues.

The archaeal genomes also indicate variability in regulatory mechanisms. The A. fulgidus genome contains multiple members of ‘bacteria-like’ two-component regulatory systems (at least 15 histidine kinases and nine response regulators), whereas the M. jannashii genome contains none at all. Even if the necessity for regulation and concomitantly the abundance of regulators are different, the genome sequences indicate that several paradigms for regulatory mechanisms are used in Archaea (e.g. helix–turn–helix and coiled-coil), and that group-specific differences exist.

Conclusions and outlook

Recent results have shown that throughout the Archaea, transcription initiation requires three promoter elements (BRE, TATA box, INR), TBP, TFB, possibly TFEα, and a multisubunit polymerase. Notably, sequence-specific DNA recognition by TFB seems to be more important for archaeal than for eukaryal transcription initiation. Until now, there have been no indications that further basal transcription factors are required in all Archaea. However, different archaeal groups display variability in the TATA box sequence, in the length of the conserved BRE and in the number of promoter elements. Furthermore, two of four archaeal genomes encode a putative TBP-interacting protein, TIP49. Together, these data indicate that modulations of the conserved core of the basal archaeal transcription apparatus is realized in different Archaea, and that not all basal transcription factors have been detected until now. Figure 2 gives an overview of an archaeal preinitiation complex.

Figure 2.

. Overview of an archaeal preinitiation complex. Basal transcription factors and the polymerase are shown schematically. TIP49, which is only present in a subset of Archaea, and TFEα were positioned arbitrarily.

Future research in several areas is required for a deeper understanding of archaeal transcription initiation. (i) The details of preinitiation complex formation for different archaeal systems need to be clarified (e.g. orientation of the complex, involvement of TFEα and TIP49, dissociation constants for various proteins, adaptation to extremes of osmolarity or temperature). (ii) The later phases of transcription (promoter clearance, elongation) have not been studied at all (e.g. transcription factor recycling, frequency of abortive and successful transcription). (iii) Nothing is known about the interaction of gene-specific regulators and the basal transcription apparatus, but a few systems are emerging that will allow questions related to this problem to be addressed (e.g. activating or repressing mode of action, molecular characterization of regulators with defined proteins of the basal transcription apparatus). (iv) Some lines of evidence indicate that global regulatory mechanisms exist. Again, the mechanisms are unknown (e.g. role of DNA structure, involvement of histone-like proteins). Thus, the field of archaeal transcription initiation has not only been in rapid development in recent years, but it is promising to yield further exciting results in the near future.

Supplementary material

Owing to space limitations of a ‘MicroReview’, only a minor fraction of references could be cited, and I apologize to all contributors to the field whom I could not acknowledge properly. A more elaborate reference list is available upon request, or can be obtained on the web ( Similarly, multiple sequence alignments and similarity matrices for the various transcription factors are available.

The four archaeal genomes are accessible at the following internet sites:

A. fulgidus:

M. thermoautotrophicum

M. jannaschii:

P. horikoshii:


I am supported by a ‘Heisenberg-Stipendium’ from the Deutsche Forschungsgemeinschaft, grant DFG So 264/4. I thank Karsten Melcher and Nadja Patenge for critical reading of the manuscript, my group for patience during the preparation time, and the editor for extreme patience waiting for the manuscript, which had been promised long before.