Analysis of cepA and other Bacteroides fragilis genes reveals a unique promoter structure


  • Douglas P Bayley,

    1. Department of Microbiology and Immunology, 600 Moye Blvd., East Carolina University School of Medicine, Greenville, NC 27858-4354, USA
    Search for more papers by this author
  • Edson R Rocha,

    1. Department of Microbiology and Immunology, 600 Moye Blvd., East Carolina University School of Medicine, Greenville, NC 27858-4354, USA
    Search for more papers by this author
  • C.Jeffrey Smith

    Corresponding author
    1. Department of Microbiology and Immunology, 600 Moye Blvd., East Carolina University School of Medicine, Greenville, NC 27858-4354, USA
    Search for more papers by this author

*Corresponding author. Tel.: +1 (252) 816-3127; Fax: +1 (252) 816-3535, E-mail address:


There is little known about the sequences that mediate the initiation of transcription in Bacteroides fragilis, thus transcriptional start sites for 13 new genes were determined and a total of 23 promoter regions upstream of the start sites were aligned and similarities were noted. A region at about −7 contained a consensus sequence of TAnnTTTG and upstream in the region centered at about −33, another TTTG motif was found in the majority of promoters examined. Canonical, Escherichia coli, −10 and −35 consensus sequences were not readily apparent. Mutations within the −7 motif indicated the TTTG residues were essential since changes in this sequence reduced the promoter activity to that of a no promoter control in a chloramphenicol acetyl transferase transcriptional fusion model system. Additional fusion studies indicated that the −33 region was also necessary for full activity.


The CytophagaFlavobacterBacteroides phylum is a major bacterial lineage that diverged early during the course of evolution, apparently preceding the emergence of Gram-positive bacteria [20,21]. Overall, members of this phylum display a wide range of phenotypic and physiological characteristics, and are found in diverse habitats from soil to clinical specimens. The Bacteroides group, which includes the Porphyromonas and Prevotella spp., is composed of obligate host-associated anaerobes which are the predominant indigenous species in the intestinal tract of humans where they account for a majority (∼30%) of the species. There is relatively little known about the mechanisms of genetic control in these organisms including the basic features and signals required for the initiation of transcription. For example the RNA polymerase σ70 initiation site of most bacteria is characterized by a promoter consensus sequence centered at the −10, TATAAT, and at the −35, TTGACA relative to the transcription initiation site (TIS) [4,5]. However, in a previous study using a cat reporter gene system, this consensus did not function in Bacteroides fragilis[19]. These results showed that a variety of Escherichia coli consensus promoters (including rrnB P1P2, Ptac, and Ptrc) were not able to mediate transcription of the reporter in B. fragilis yet given a native B. fragilis promoter fragment, the cat gene was transcribed and CAT (chloramphenicol acetyl transferase) activity could be measured.

In the present study, we have chosen to use the cepA gene as a model system for Bacteroides promoter structure. This gene encodes the conventional, chromosomal β-lactamase of B. fragilis which is found in more than 90% of isolates [9,16]. Two naturally occurring cepA classes are generally observed, high- and low-level resistance to traditional penicillin antibiotics. The high levels of resistance result from overexpression of cepA due to transcriptional activation by a putative IS element (IS1224) which inserted at −23 from the TIS. Based on the location of this insertion and the fact that the transcription +1 did not change, we have speculated that a hybrid promoter was formed in which the native −35 region was replaced by IS1224 sequences leading to increased cepA transcription. In the present report we have combined mutational analysis of the cepA promoter together with nucleotide sequence alignments of known and new B. fragilis promoters to deduce the primary structure of the transcription initiation signals in these unique organisms.

2Materials and methods

2.1Strains, growth conditions, and DNA transfer

B. fragilis strain 638R [11] was the standard host strain, and DNA for PCR was obtained from strains RBF-103, ampicillin minimal inhibitory concentration (MIC) 750 μg ml−1 and ATCC-25285, ampicillin MIC 32 μg ml−1[16]. Bacteroides strains were grown in hemin-supplemented brain heart infusion (BHIS) medium [18] and E. coli strains were grown in LB medium [8]. Plasmids were transformed into E. coli DH5α or DH10B cells by standard methods [8] and plasmid transfer from E. coli strains to B. fragilis was accomplished by tri-parental filter matings [19].

2.2DNA manipulations and PCR

Standard procedures were used for routine manipulation of DNA [8]. PCR was performed with Pwo polymerase and oligonucleotide primers are shown in Table 1. Excluding the 5′ restriction site, the first base of the forward primers RBF-F1, RBF-F2, and RBF-F3 annealed to −337, −223, and −136 nucleotides, respectively, upstream of the RBF-103 cepA TIS. The standard reverse primer RBF-R1 annealed to +69 (cepA codon 14) and RBF-R2 annealed to +27 (just prior to cepA ATG). All promoter fragments were cloned into the cat fusion vector pFD395 (pFD325TT; [19]) for analysis of promoter activity.

Table 1.  Primers used for analysis of cepA promoter function
Primer nameSequence
Primer sequences are all in the 5′ to 3′ direction with engineered restriction sites underlined. For −7 motif mutagenic primers (RBF-AAAC, -TTTC, -TTAG, -TATG, -ATTG, -AT), the mutagenized bases are in bold.

Hybrid −33 promoter regions containing sequences from both the high and low expression classes were amplified using the reverse primers HILO35, 47, and 54 and primer RBF-F3 with RBF-103 template to obtain the high expression class sequences. Subsequently these PCR products were cloned into the EcoRI site present in the low expression promoter region from ATCC-25285 which was amplified from ATCC-25285 DNA with primers RBF-R2 and ATCC-F2. The hybrid fusion promoters were then ligated into pFD395. The final high/low hybrid constructs were similar to pFD711 being composed entirely of high expression RBF-103 sequences from −136 to −54, then hybrid sequences between −54 and −23, and finally ATCC-25285 sequences from −23 to the cepA ATG. The nucleotide sequences of all PCR products were confirmed by DNA sequence analysis.

2.3RNA isolation, primer extension, and CAT assays

Total RNA was isolated from B. fragilis by the hot phenol method as modified by Rogers et al. [15]. Primer extension analysis was performed on 50–100 μg total RNA using gene-specific primers as previously described [15]. Strains to be assayed for CAT were grown in 50 ml of BHIS medium until mid-exponential phase and then assayed according to previously published methods [19]. Specific activity was calculated from the molar extinction coefficient for free 5-thio-2-nitrobenzoate at A412. Total protein was measured by the method of Bradford [2].


3.1Compilation and alignment of B. fragilis promoters

To determine the transcription initiation signals in B. fragilis, nucleotide sequences adjacent to known TIS were aligned. In order to supplement the few B. fragilis genes for which the TIS has been reported, primer extension analysis was performed on 13 new genes. A total of 23 mapped start sites were used exclusively to generate the alignment shown in Fig. 1. Generally the +1 was mapped to a purine (17/23) with A (11/23) being preferred; only 6/23 were a pyrimidine with C being more common. DNA sequences around the start sites and out to about −50 were initially aligned using the GCG program Pileup and then modified by hand to maintain alignment with the +1. Inspection of the final line up revealed two conserved regions that were formally equivalent to the −10 and −35 regions. First was a TAnnTTTG motif generally centered between six and nine nucleotides upstream of the TIS with an average distance of 7.6 (Fig. 1). This octamer was conserved in 80% of the promoters examined. Also, in 22/23 cases the octamer was followed by a C or T. Another conserved TTTG sequence was found centered around −33 from the TIS. This sequence shares similarity to the typical −35 consensus, TTGACA, but was shorter in length. Spacing between the last base of the −33 sequence and the start of the TAnnTTTG octamer generally was between 19 and 21 bases with an average of 20.

Figure 1.

Compilation of B. fragilis promoters. Promoter regions were aligned based on the TIS indicated by the arrow. This is an alignment of promoters supported by biochemical evidence and was used to generate a consensus (con). On the consensus line, upper case letters indicate ≥80% conservation and the lower case letters are ≥60% conservation. Ref=primary citation [6,10,12–15]; P=new primer extension data; a=personal communication M. Malamy.

3.2Cloning and characterization of cepA promoter fragments

The two cepA gene classes, high- and low-level expression, provide a useful model system to assess structure/function relationships of the conserved B. fragilis promoter regions. To develop this system, a nested set of cepA promoter fragments was PCR-amplified from a high expression strain (RBF-103) and cloned into the CAT fusion vector pFD395 (Fig. 2). Fragments which included the 43 nucleotides into the N-terminus of the cepA structural gene were viable only in E. coli and could not be transferred into B. fragilis. This suggested that transcription and subsequent translation of this truncated protein was lethal. In contrast, clones which extended only to the cepA translational start codon were not lethal in B. fragilis. A qualitative assessment of promoter strength based on growth in media with chloramphenicol suggested that all constructs were of similar strength regardless of the amount of upstream sequence present (Fig. 2). The smallest promoter fragment (pFD711) was used in further analyses.

Figure 2.

Genetic map of the high (RBF-103) and low (ATCC-25285) expression class promoter regions and the cat transcriptional fusions constructed for their analysis. Hash marked boxes indicate the cepA structural gene coding region; gray boxes denote the IS1224 insertion into RBF-103; white boxes represent the native region upstream of cepA present in the type strain ATCC-25285; stippled boxes indicate the cat reporter gene in fusion constructs; the arrow above the maps indicates the +1 transcription start site. The arrows underneath the ATCC-25285 map show the location of the reverse primers RBF-R1 and -R2. Ability of B. fragilis strains containing the indicated fusion construct to grow on media with chloramphenicol is indicated where appropriate.

3.3Analysis of cepA promoter mutations in the −7 octamer

To identify nucleotide positions important for activity, single and multiple positions within the conserved TAnnTTTG motif of cepA (pFD711) were mutated. In each case, the mutated nucleotides were converted to their complement to preserve the melting point characteristics. The most striking result was observed when all four positions within the TTTG portion of the motif were mutated (pFD714) and promoter activity was reduced to the basal level (Table 2). This result indicates the importance of this portion of the promoter consensus. In addition, reversal of the initial two nucleotides (TA) to AT (pFD733) resulted in a 60% decrease in CAT activity. Of the individual point mutations within the −7 region, only pFD729, in which the final guanine was changed to a cytosine, resulted in a sizable (60%) reduction in activity.

Table 2.  CAT enzyme activity for cepA wild-type (pFD711) and derivative mutant promoters in B. fragilis 638R
  1. aThe RBF-103 wild-type promoter, pFD711, was taken as 100% activity.

PlasmidPromoter sequenceCAT activity (%)a
Results were obtained from mid-exponential phase cells grown in BHIS media and are the average of three assays. Mutated promoters differed from pFD711 only at the mutated bases shown in bold, and were constructed by PCR with the primers shown in Table 1.

3.4Analysis of hybrid cepA promoters altered in the −33 region

The results in Fig. 1 indicated that a second conserved B. fragilis promoter motif, TTTG, was centered at about the −33 position. To examine the importance of this motif, cat fusions containing hybrid cepA promoters that were altered in this region were compared. This was accomplished by replacing the sequences associated with the high-level expression class (RBF-103 i.e. IS1224 activated) with the low-level expression sequences found in strain ATCC-25285 (Fig. 3). Only sequences upstream of the −7 were modified and the results were compared to the CAT activity expressed by the RBF-103 promoter. There was a dramatic effect when the entire region between −24 and −54 was replaced by the low-level sequence; activity was essentially reduced to the same level of activity seen with the wild-type ATCC-25285 promoter. The two intermediate fusions, −35 and −47, were useful since they simulated removing just the TTTG (−35) or moving it three nucleotides further upstream (−47). In both cases there was a decrease in promoter activity but not as severe as that seen with construct −54. These results are consistent with a model that assigns the essential promoter functions or contact within a 50 nucleotide region from about −4 to −54 relative to the TIS.

Figure 3.

Comparison of cepA hybrid promoters consisting of sequences from both high- and low-level expression classes in the −33 region. CAT activity of the cat fusion plasmid pFD711 (103-Hi) was compared to hybrid promoters and to the wild-type low-level expression plasmid 25285-Lo. Sequences upstream of the IS1224 insertion point were replaced with sequences from ATCC-25285 as described in Section 2. Only the relevant DNA sequences are shown and the IS1224 sequences present in RBF-103 are indicated by the black boxes with lower case letters. The effect of these changes on activity of the cat reporter gene was determined in triplicate.


In prokaryotes the initiation of transcription begins with a sequence-specific binding of RNA polymerase which contacts the DNA at two regions, −10 and −35 relative to the TIS [4,5]. This contact is mediated by one of a family of sigma factors and in most species the predominant sigma factors during normal growth recognize the highly conserved −10 (TATAAT) and −35 (TTGACA) hexamers. With B. fragilis in about 80% of the cases examined, we found two conserved regions analogous to the −10 and −35 but their sequences are notably different than other species with an octamer, TAnnTTTG, centered at −7 and a tetramer, TTTG, centered about −33 (Fig. 1). The sequence alignment from which this consensus was derived covers a wide range of genes with different functions and not all of the genes examined fit perfectly within this consensus. For example sod has only 50% matches in the −7 region and unusual spacing between the −7 and −33. However, this is a highly regulated gene that may be controlled by several different mechanisms. Several others such as recA[3] or fruA[1] also deviated somewhat from the consensus, suggesting that our consensus may not strictly apply to all promoters.

Further support for our consensus can be found from the mutation analysis of the cepA promoters. Most significantly, modification of the −7 TTTG completely disrupted expression of the reporter gene, and while no individual mutation abolished activity, some led to less dramatic reductions (Table 2). The −7 region is dissimilar to any of the known −10 sequences although it does keep the general AT rich character found in promoters from other organisms. Another unusual feature of the −7 was its length at eight bases which is larger than the common hexamer and it was located closer to the TIS than seen with other organisms [4,5,22]. In addition, the TTTG is reminiscent of the −16 TTTTTTTG motif found in more than 50% of the Campylobacter jejuni promoters examined [22]. More recently in Porphyromonas gingivalis which is closely related to Bacteroides sp., a TTGC sub-sequence referred to as the P3″ element was found within 10 bp of the TIS in five genes [7]. We have examined many of these newly mapped P. gingivalis promoters [7] and found that while they did not align precisely with our consensus, many did have the TAnnTTTG motif within 20 nucleotides of the TIS.

The B. fragilis promoters also had a conserved TTTG element centered around −33 but sequences out to −54 appeared to be required for optimal activity. This is not unusual as the recent discovery of UP elements has extended the promoter region out to about −60 [17]. Deletion of the −33 TTTG motif did not abolish activity as it did in the −7 element but there was a sharp reduction in activity of the reporter gene (Fig. 3). The matter of spacing between the −7 and −33 may be a significant factor in promoter activity as is known for E. coli, and it could in part explain the differences between the low- and high-level cepA expression classes. Alternatively we could have removed a cepA regulatory repressor site present in the low-level promoters.

The unconventional promoter motifs of B. fragilis are unusual among the bacterial kingdom. Although the promoter sequences of several organisms have been found to vary somewhat from the E. coli consensus, there is overall a fairly close alignment with the conserved bases. Even C. jejuni which lacks a typical −35 sequence has the classic, highly conserved −10 sequence [22]. It is possible that the departure from typical transcription start signals may reflect the evolutionary position of the Bacteroides group which diverged early from the main eubacterial branch [20,21].


The authors acknowledge support of National Institutes of Health PHS Grant AI28884. We thank C. Herren and D. Smalley for preliminary data on primer extension products.