Identification of additional genes belonging to the LexA regulon in Escherichia coli


  • Antonio R. Fernández de Henestrosa,

    1. Section on DNA Replication, Repair and Mutagenesis, Building 6, Room 1A13, National Institute of Child Health and Human Development, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20892-2725, USA.
    Search for more papers by this author
  • Tomoo Ogi,

    1. Institute for Virus Research, Kyoto University, Sakyo, Kyoto 606-8507, Japan.
    Search for more papers by this author
  • Sayura Aoyagi,

    1. Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA.
    Search for more papers by this author
  • David Chafin,

    1. Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA.
    Search for more papers by this author
  • Jeffrey J. Hayes,

    1. Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA.
    Search for more papers by this author
  • Haruo Ohmori,

    1. Institute for Virus Research, Kyoto University, Sakyo, Kyoto 606-8507, Japan.
    Search for more papers by this author
  • Roger Woodgate

    Corresponding author
    1. Section on DNA Replication, Repair and Mutagenesis, Building 6, Room 1A13, National Institute of Child Health and Human Development, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20892-2725, USA.
    Search for more papers by this author


Exposure of Escherichia coli to a variety of DNA-damaging agents results in the induction of the global ‘SOS response’. Expression of many of the genes in the SOS regulon are controlled by the LexA protein. LexA acts as a transcriptional repressor of these unlinked genes by binding to specific sequences (LexA boxes) located within the promoter region of each LexA-regulated gene. Alignment of 20 LexA binding sites found in the E. coli chromosome reveals a consensus of 5′-TACTG(TA)5CAGTA-3′. DNA sequences that exhibit a close match to the consensus are said to have a low heterology index and bind LexA tightly, whereas those that are more diverged have a high heterology index and are not expected to bind LexA. By using this heterology index, together with other search criteria, such as the location of the putative LexA box relative to a gene or to promoter elements, we have performed computational searches of the entire E. coli genome to identify novel LexA-regulated genes. These searches identified a total of 69 potential LexA-regulated genes/operons with a heterology index of < 15 and included all previously characterized LexA-regulated genes. Probes were made to the remaining genes, and these were screened by Northern analysis for damage-inducible gene expression in a wild-type lexA+ cell, constitutive expression in a lexA(Def) cell and basal expression in a non-inducible lexA(Ind) cell. These experiments have allowed us to identify seven new LexA-regulated genes, thus bringing the present number of genes in the E. coli LexA regulon to 31. The potential function of each newly identified LexA-regulated gene is discussed.


Upon DNA damage, many prokaryotes will elicit the so-called ‘SOS response’. This phrase, first coined by Miroslav Radman in the mid-1970s (Radman, 1974), embodies the pleiotropic response used by the cell when it is in distress. Many of the proteins induced as part of this response are involved in DNA replication, repair and the control of cell division (for recent reviews, see Friedberg et al. 1995; Koch and Woodgate 1998). Although the SOS response, in its broadest sense, represents all damage-inducible genes, it quickly became synonymous with a dual-component system in which RecA protein is the activator and LexA a negatively acting transcriptional regulator (Little et al., 1980; 1981; Brent and Ptashne, 1981). Less well characterized and underappreciated, however, is the fact that many other proteins are induced by DNA damage that are not directly regulated by RecA or LexA (see Koch and Woodgate, 1998 and references therein). Thus, the global SOS response is composed of two subpathways, one that is RecA–LexA regulated (and is considered to be the ‘classical’ SOS response) and one that is RecA–LexA independent (Koch and Woodgate, 1998). The precise number of proteins induced as part of the global SOS response is presently unknown, but it is likely to be considerable given the unexpected number of damage-inducible genes recently reported in Saccharomyces cerevisiae (Jelinsky and Samson, 1999).

Over the years, several different strategies have been used in attempts to identify genes that are directly regulated by RecA–LexA. Kenyon and Walker (1980) used a Mudlac bacteriophage to make random gene fusions in the Escherichia coli chromosome. By screening for colonies with increased expression of β-galactosidase after UV irradiation, they were able to identify a number of damage-inducible (din) loci. Subsequent analysis revealed that some of these loci included polB (dinA) (Bonner et al., 1990; Iwasaki et al., 1990); uvrA (dinE) (Kenyon and Walker, 1981) and umuDC (Bagg et al., 1981). Analysis of the promoter/operator regions of these din genes, as well as other genes known to be regulated by LexA, revealed a consensus LexA binding site of 5′-TACTG(TA)5CAGTA-3′ (Walker, 1984).

In 1994, using a mathematical formula described previously by Berg and Von Hippel (1988) to determine the degree of divergence of any 20 nucleotide sequence from the consensus LexA box, Lewis et al. (1994) defined the term ‘heterology index’ (HI). Those sequences with a low HI value are closer to the consensus LexA box and are predicted to bind LexA with greater affinity than those sites with a higher HI score. Lewis et al. (1994) tested their hypothesis directly by performing LexA-dependent gel mobility shift assays on a number of potential LexA binding sites and found that LexA bound to sequences with an HI value of 12.6 or lower, but not to sequences with an HI value >15. With this information to hand, Lewis et al. (1994) screened the E. coli genome (≈ 30% complete in 1994) and identified six potentially LexA-regulated genes, which they called sosA–F.

Other groups have also applied similar computational analyses to look for the recognition sites of DNA-binding proteins, including LexA (Robison et al., 1998). Indeed, using a modified version of the mathematical approach of Berg and Von Hippel (1988), Robison et al. (1998) suggested that the number of potential LexA binding sites in the E. coli chromosome varies between 16 and 65 depending on the different parameters used.

In an attempt to identify additional genes in E. coli that are transcriptionally regulated by LexA, we have used computational searches to find potential LexA binding sites in the chromosome and have analysed the inducibility of any downstream gene by Northern analysis in a variety of lexA backgrounds. Using such an approach, we have been able to identify seven new genes that appear to be part of the LexA regulon, as well as confirmed the damage inducibility of three others that were previously hypothesized as belonging to the LexA regulon (Lewis et al., 1994).


Computational searches and criteria for the identification of LexA binding sites

To identify additional genes in the LexA regulon, we have used three different computer analyses and search parameters to identify potential LexA binding sites within the entire E. coli chromosome. All the search parameters take advantage of the fact that over 20 LexA-regulated genes/LexA binding sites have been analysed experimentally, thereby allowing us to use a consensus LexA binding site (Table 1). Despite the fact that this list now contains more LexA binding sites than used in previous analyses, the consensus remains essentially unchanged from that proposed previously (Walker, 1984; Lewis et al., 1994). Each of the programs uses slightly different variations of the consensus as search patterns, with the hope that all potential LexA binding sites will be identified. Obviously, for LexA to act as a transcriptional repressor, the binding site needs to be located within, or very close to, the promoter region of any potential gene. Even though the E. coli genome is completely sequenced, a large question remains as to the function of many potential open reading frames (ORFs) (Rudd, 1998). As a consequence, we have not excluded potential LexA binding sites from our study if they are located in a hypothetical ORF with no known function. A similar study by Lewis et al. (1994) demonstrated that potential LexA binding sites with an HI value > 15 did not bind LexA protein, and we have used this criteria as our breakpoint for LexA-regulated genes.

Table 1. Chromosomally encoded E. coli LexA-regulated genes and the sequence of their respective LexA boxes.
Gene LexA box
  1. The underlined sequences in the dinG and recN(3) LexA boxes highlight the fact that they deviate from the conserved ‘CTG′ motif.


Using this approach, we have identified 62 potential LexA binding sites that are predicted to be located within the promoter region of known genes or hypothetical ORFs (Table 2). Most appear to regulate single genes or operons. However, some of these (see numbers 1 and 2; 15 and 16; 23 and 24; 36 and 37; 47 and 48; 52 and 53; and 68 and 69 in Table 2) are located in an intergenic region between two genes transcribed in opposite directions, and thus appear twice (as represented in each orientation), giving rise to a total of 69 potential LexA-regulated genes or operons (Table 2). It should be emphasized that these search programs and parameters identified all the previously identified LexA-regulated genes or those predicted to be regulated by LexA (Lewis et al., 1994), and it seems reasonable to expect that all the potential LexA-regulated genes in E. coli have now been identified.

Table 2. Potential LexA boxes in the E. coli chromosomea·
 Potential LexA boxSearch patternbGenecAlternative namedDistanceeHI
  • a

    . The sequence of the LexA box is reported as that found on the coding DNA strand. For boxes that are located between two divergent genes, the reciprocal sequence is also given on the following line. We have noted these cases in the leftmost column by indicating the number of the original LexA box in superscript font.

  • b

    . The numbers in the search pattern column indicate which of the three different search patterns used in this study (see Experimental procedures for more details) identified the particular LexA box.

  • c . In most cases, the gene name used here is that reported by Berlyn (1998) and Rudd (1998).

  • d

    . This is a pseudonym of the gene that has been used in previous studies.

  • e

    . The ‘distance’ refers to the location of the LexA box relative to the ATG codon of the respective gene.

  • f

    . HI values with an asterisk indicate genes that have previously been shown to be regulated by LexA.

  • g . Although identified in Berlyn (1998) as dinD, this gene was identified as pcsA (Ohmori et al., 1995b), and we have used that term here.

  • h . These genes were not given names in the Eco10 map ( Rudd, 1998), and we have assigned them dinQ and dinS respectively.

  • i

    . Sequences underlined represent deviations from the consensus core LexA binding site.

1. TACTGTTTATTTATACAGTA 2, 3 ysdAB   −1422.35
21R. TACTGTATAAATAAACAGTA 2, 3 ilbL b3672 −  3302.35
3. TACTGTATATAAAAACAGTA 1, 2, 3 umuDC   −372.77*f
4. TACTGTATATAAAAACAGTA 1, 2, 3 sbmC   −322.77*
5. AACTGTATATAAATACAGTT 1, 2, 3 pcsA g dinD −613.34*
6. TACTGTATGCTCATACAGTA 1, 2, 3 recA   −774.31*
7. TACTGTACATCCATACAGTA 1, 2, 3 sulA   −424.65*
8. TACTGTATGATTATCCAGTT 2, 3 dinQ h   −1394.83
9. TACTGTATATAAAACCAGTT 1, 2, 3 recN (1)  −665.16*
10. AACTGTTTTTTTATCCAGTA 1, 2, 3 uvrB   −926.11*
11. ACCTGTTTAAATATCCAGTA 2, 3 ycgJ b1177 −2626.24
10. TACTGGATATTTAAACAGGT 2, 3 minC   −2776.24
12. ACCTGTATAAATAACCAGTA 1, 2, 3 dinI   −376.24*
13. TGCTGTATATACTCACAGCA 1, 2, 3 lexA/dinF (1)  −456.34*
14. CACTGTATAAATAAACAGCT 2, 3 hokE ybdY −976.72
15. TACTGTATATTCATTCAGGT 1, 2, 3 uvrA dinE 1016.98*
1615R. ACCTGAATGAATATACAGTA 1, 2, 3 ssb   −1706.98*
17. TACTGTATAAAATCACAGTT 1, 2, 3 yebG   −357.03*
18. TGCTGTATATTTATTCAGCT 1, 2, 3 yafL b0227 1937.06
19. AGCTGAATAAATATACAGCA 1, 2, 3 dinJ sosA −327.06
20. AACTGTATATACACCCAGGG 1, 2, 3 lexA/dinF (2)  −248.32*
21. TCCTGTTAATCCATACAGCA 1, 2, 3 ftsK dinH −968.61*
22 ATCTGTATATATACCCAGCT 1, 2, 3 uvrD   −748.80*
23. TACTGTATAAACAGCCAATAi 1, 3 ybiA b0798 −1058.98
2423R. TATTGGCTGTTTATACAGTA 1, 3 dinG   −348.98*
25. TACTGTACGTATCGACAGTT 1 ydjM (1) b1728 −349.07
26. CGCTGGATATCTATCCAGCA 1, 2, 3 ruvAB   −679.19*
27. AACTGGACGTTTGTACAGCA 1, 2, 3 yigN b3832, sosB −639.27
28. TACTGTACACAATAACAGTA 1, 2, 3 recN (2)  −469.38*
29. TACTGATGATATATACAGGT 1, 2, 3 yjiW dinL, sosC −959.40
30. CACTGGATAGATAACCAGCA 1, 2, 3 ydjQ b1741, sosD 339.54
31. AGCTGTATTTGTCTCCAGTA 2, 3 dinS h IS150 −749.60
32. GCCTGTCTGAACAAACAGTA 2, 3 tyrS dinN, sosE +27510.12
33. TTCTGGTTTAATAAACAGCA 2, 3ORF within ycgM  −5710.38
34. AACTGGATAAAATTACAGGG 1, 2, 3 molR dinO, sosF −2710.68
35. TGCTGTTTTTATAAACAATG 1, 3 mug tng, ygjF −6910.77
36. TCTTGTATATCCAACCAGTT 3ORF within yeeI  −11810.99
3736R. AACTGGTTGGATATACAAGA 3ORF within yeeI  −5510.99
38. TAATGGTTTTTCATACAGGA 1 recN (3)  −2811.47*
39. ACCTGAATATTCAAACAGCG 2, 3ORF within ydbK  −12211.77
40. AATTGTTAATATATCCAGAA 1, 3 yciG b1259 −15711.97
41. GACTGTATAAAACCACAGCC 1, 2, 3 polB dinA −7112.09*
42. TTCTGGATAAGCATCCAGAA 2, 3 ygiS b3020 −17312.16
43. TTCTGGATGCTTATCCAGAA 2, 3ORF within ygiT  −4512.16
44. GTCTGAATGAATACCCAGTA 1, 2, 3 xylE   −2312.59
45. TGCTGGGTAAATATAAAGCC 1 ydbH b1381 +112.71
46. CACTGTATACTTTACCAGTG 1, 2, 3 dinB   −3212.84*
47. TGCTGTTTTAGCATTCAGTG 1, 2, 3 creA   −14512.93
4847R. CACTGAATGCTAAAACAGCA 1, 2, 3 rob   −8112.93
49. AATTGTTTAAATATACCGCT 1, 3 brnQ   −8712.95
50. TCCTGGTTTTATATTCATTA 1 yiaO b3579 −9212.99
51. GGCTGGACAATTTTACAGCT 1, 2, 3 hofQ   −10013.05
52. CGCTTGAGGAATATACAGTA 1 metE   −20513.12
5352R. TACTGTATATTCCTCAAGCG 1 metR   −5113.12
54. TCCTGGCTATTTTGCCAGTA 2, 3ORF within ydcL  −10113.15
55. TGCTACATTAATAAACAGTA 1 yhiX b3516 −4713.23
56. TTCTGTGAGTTAACACAGTC 1, 2, 3 pshM   −8913.28
57. AACCGTAGAAATCTACAGCT 1 ycgL b1179 −5713.33
58. AACTGGGAAACTATAAAGTA 1 rfaJ   −3913.36
59. TACTGTCTGTATATATAAGT 1 yjgN b4257 −5513.46
60. GCCTGTGTTAGTTTCCAGTA 1, 2, 3 ybiT b0820 −313.55
61. CACTGTATAAAAATCCTATA 1 ydjM(2) b1728 −5213.60
62. CACTGGGAGTAAATAAAGTA 1 ilvD   −1913.71
63. TTCTGTTTATTCATACCGGC 1 yecS b1918 −7313.85
64. GATTGAATGAATATACAGGG 1, 3 ecpD   −7913.86
65. TATTGGACGAGCATACAGAA 1, 3 ydeJ b1537 −6213.89
66. AACTGATTAAAAACCCAGCG 2, 3 ybfE b0685 −13614.07
67. TACTGTCTGCATCATCAGGA 2, 3 ycbU b0942 −10014.11
68. TACTGTCTACCAAAACAGAG 2, 3 yfiK b2578 −3014.98
6968R. CTCTGTTTTGGTAGACAGTA 2, 3 yfiE b2577 −9214.98

E. coli genes that are LexA regulated

Despite the appropriate location of a LexA binding site within a gene’s promoter region, we felt that it was necessary to determine experimentally that a gene is transcriptionally regulated by LexA. To do so, we have made oligonucleotide probes to each of the genes identified in Table 2 that have not already been shown to be LexA regulated (those lacking an asterisk in Table 2) and have analysed the pattern of gene expression in three different lexA backgrounds in the absence and presence of exogenous DNA damage. Known LexA-regulated genes, such as recA and umuDC, which we used as positive controls in these experiments, exhibit the following response in the three backgrounds: in a wild-type lexA+ cell, there is basal expression of the gene, which is enhanced by exposure to DNA damage; in a lexA3(Ind) strain, which has a G85D change at the cleavage site of LexA (Markham et al., 1981; Little, 1984), thus rendering the repressor non-cleavable (Lin and Little, 1989), no damage-inducible induction of LexA-regulated genes is observed (Mount et al., 1972); in a lexA51(Def) mutant strain that encodes a defective repressor (Mount, 1977; see Experimental procedures), all LexA-regulated genes are expressed at derepressed levels, irrespective of whether the cell has been exposed to exogenous DNA damage.

Of the 49 genes analysed, 10 had the same characteristic expression expected of known LexA-regulated genes (Fig. 1). Three of these genes had previously been implicated as being LexA regulated based upon similar computational searches to that used here (Lewis et al., 1994). Our present study therefore demonstrates that yjiW (also known as sosC or dinL), ydjQ (also known as sosD or dinM) and molR (also known as sosF or dinO) are, in fact, LexA-regulated genes. Other genes from the E. coli genome that are LexA regulated are ysdAB, dinQ, hokE, ydjM, dinS and ybfE (Fig. 1).

Figure 1.

Northern analysis of E. coli genes that appear to be regulated by LexA. RNA was extracted from three isogenic strains that differed only in their lexA gene: RW118 (lexA+), RW434 (lexA3[Ind]) and RW542 (lexA51[Def]). RNA was obtained from undamaged cells (–) or from cells that had been exposed to mitomycin C (5 µg ml−1) (+) for 30 min before extraction (see Experimental procedures for more details). Both the previously identified LexA-regulated recA and umuDC genes were used as positive controls in this experiment. The genes are depicted according to their ascending heterology index (HI). The size of the mRNA transcript (mRNA) and the possible function of each gene are also indicated.

The HI value of these 10 genes varies considerably. The ydsAB operon, for example, has an HI value of 2.35 (the lowest of all known LexA-regulated genes) and would therefore be expected to be tightly regulated. As can be seen from Fig. 1, that appears to be the case, with no ydsAB transcript observed in an undamaged lexA+ cell or in an undamaged or damaged lexA3(Ind) cell. In comparison, ybfE has an HI value of 14.07 (the highest of any LexA-regulated gene) and would therefore be expected to have relatively high basal levels of expression. Indeed, as can be seen in Fig. 1, we observed significant levels of ybfE transcript in an undamaged lexA+ cell, as well as in both damaged and undamaged and lexA3(Ind) cells.

Finally, another noteworthy observation arising from these studies is that ydjM has two closely spaced LexA binding sites with HI values of 9.07 and 13.60 respectively (Table 2). ydjM is therefore only the third E. coli LexA-regulated gene to have multiple LexA binding sites, the others being lexA itself (two sites) (Brent and Ptashne, 1981; Little et al., 1981) and recN (three sites) (Rostas et al., 1987).

Table 3. Oligonucleotides used in this studya·
  • a

    . This is an abbreviated list of primers. See Experimental procedures for more details.

AR405′-AAGGAGACTTCTGTCCCTTGCGGGGTGTCC-3′Amplification of polB promoter region
AR415′-GCCGAAGATGCGGTGCATAACGCCATCGTG-3′Amplification of polB promoter region; DIG labelled
AR1415′-CGCTGTTGCTCATTTGAGC-3′Amplification of dinJ promoter region with AR18
AR1425′-TAGCGCCTCGCAGGAAATGCC-3′Amplification of yigN promoter region with AR86
AR1435′-ATTTAAGAGAAAGAGTTACACCGTCACCAC-3′Amplification of ydjM promoter region with AR72

The 10 LexA-regulated genes depicted in Fig. 1 therefore bring the present number of genes in the E. coli LexA regulon to 31. The location of each of these genes with respect to the ‘100-minute’ chromosome is depicted in Fig. 2.

Figure 2.

Location of LexA-regulated genes in the E. coli chromosome. The exact position of the 31 LexA-regulated genes is shown with respect to the ‘100-minute’ map. Those genes that are transcribed clockwise are denoted (+), while those that are transcribed in an anticlockwise manner are denote (–). LexA-regulated genes characterized in this study are boxed and are in bold typeface.

Genes that have an appropriately positioned LexA box with a good heterology index, but do not appear to be LexA regulated

Our initial computational search was for potential LexA binding sites that had an HI < 15. However, when one compares the list of LexA-regulated genes (Table 2), one finds that, of the 31 genes in the regulon, only four have an HI value > 10. These include molR (HI value 10.68), polB (HI value 12.09), dinB (HI value 12.84) and ybfE (HI value 14.07). This in turn suggests that a good indicator of LexA binding and/or regulation would be an HI value < 10. While this does indeed appear to be the case, there were some exceptions. Some of these, such as ivbL (HI value 2.35), ycgJ (HI value 6.24) and minC (HI value 6.24), can be explained by the fact that the proposed LexA binding site is quite far from the translational start site of the gene and therefore may not overlap any promoter elements (Table 2). However, at first glance, this is probably not the case for the LexA binding sites upstream of dinJ (sosA) (HI value 7.06) (Lewis et al., 1994; Ohmori et al., 1995a) and yigN (sosB) (HI value 9.27) (Lewis et al., 1994; Rudd, 1998), which appear to be appropriately located (Table 2). As can be seen in Fig. 3, however, neither gene appears to be LexA regulated, showing constitutive expression in all three genetic backgrounds, irrespective of whether the cell had been exposed to mitomycin C or not.

Figure 3.

Northern analysis of E. coli genes that have a low HI value but do not appear to be regulated by LexA. As noted in Fig. 1, RNA was extracted from three isogenic strains that differed only in their lexA gene: RW118 (lexA+), RW434 (lexA3[Ind]) and RW542 (lexA51[Def]). RNA was obtained from undamaged cells (–) or from cells that had been exposed to mitomycin C (5 µg ml−1) (+) for 30 min before extraction (see Experimental procedures for more details). The LexA-regulated recA gene was used as a positive control in this experiment. The genes are depicted according to their ascending heterology index (HI).

Binding of LexA protein to potential LexA binding sites

Although the HI value of dinJ and yigN is quite low and is below the general ‘cut-off’ of < 10 for most LexA-regulated genes, one explanation for the lack of LexA regulation is that the proposed site simply does not bind LexA. As noted above, we also discovered that the LexA-regulated ydjM gene has two potential binding sites in its promoter region. One of these has an HI value of 13.60 (which is above our HI ‘cut-off’ of 10, as well as that of Lewis et al. 1994, whose HI ‘cut-off’ was 12.6), and we were interested in determining whether this second binding site is, in fact, functional. To test LexA binding to these sites, we have used highly purified E. coli LexA protein to perform electrophoretic gel mobility shift assays on ≈150–300 bp polymerase chain reaction (PCR)-amplified fragments of the promoter region of each gene (Fig. 4). The control for these reactions was the polB (dinA) LexA binding site, which has an HI value of 12.09 and is therefore in the uppermost range of LexA-regulated genes. Interestingly, despite its relatively low HI value, LexA protein showed no affinity for the dinJ (sosA) LexA binding site (Fig. 4). Such findings therefore explain why no LexA regulation is observed in vivo (Fig. 3). In contrast, LexA binding to the yigN (sosB) LexA binding site was demonstrated (Fig. 4). Given this finding and the fact that yigN does not appear to be LexA regulated in vivo (Fig. 3), one would have to hypothesize that the binding site does not overlap the predicted promoter elements (Lewis et al., 1994). Indeed, closer inspection of the region upstream of yigN reveals several alternative −35 and −10 candidate promoter elements to those hypothesized previously (data not shown).

Figure 4.

Electrophoretic gel mobility shift assays. The affinity of LexA for the LexA boxes located immediately upstream of dinJ, ydjM, yigN and polB (used as the positive control) was analysed. No LexA (–) or various concentrations (2.5 nM, 5 nM, 10 nM, 25 nM, 50 nM and 100 nM) of LexA were added to the reaction and incubated with 0.5–1 ng of the digoxigenin-labelled probe for 30 min (see Experimental procedures for more details). Protein–DNA complexes were separated in a 6% non-denaturing acrylamide gel run at 100 V for 2.5 h.

Unlike yigN and polB, both of which gave a single LexA–DNA complex, the ydjM promoter gave three discrete products in the presence of relatively high levels of LexA protein, suggesting multiple binding sites and/or variable amounts of LexA protein in each DNA–protein complex. We have analysed this further by hydroxyl radical footprinting of the ydjM promoter region (Fig. 5) and find that the area of protection includes 50+ bases that are centred between the two candidate LexA binding sites. As the footprint of LexA has been demonstrated previously to be ≈ 20–28 bp (Brent and Ptashne, 1981; Little et al., 1981), we conclude that both the higher affinity (HI value 9.07) and the lower affinity (HI value 13.60) sites are occupied by LexA and that binding of the lower affinity site probably occurs via co-operative interactions with the higher affinity site (Brent, 1982; Schnarr and Granger-Schnarr, 1993). Closer inspection of the footprints reveals five regions of ‘protection’, indicating where the DNA phosphodiester backbone is in direct contact with the bound protein (Tullius and Dombroski, 1986). Previous analysis of a LexA–DNA complex (Lloubes et al., 1991) and other bacterial repressors bound to DNA (Tullius and Dombroski, 1986) indicates that each bound repressor dimer typically results in three regions of protection, one in which the protection is rather strong, flanked by two in which the protection from hydroxyl radical cleavage is somewhat weaker (cf. Winterling et al., 1998). However, because of the short 2 bp spacing between the two LexA boxes in the ydjM promoter, the two patterns of protection resulting from repressor dimer binding to each site overlap, essentially resulting in a ‘sharing’ of one weaker area of protection between the two sites.

Figure 5.

Footprinting analysis of LexA bound to the ydjM operator.

A. DNA sequence context of the ydjM gene. The ydjM gene is located within the 1806–3650 interval of the E. coli chromosome (black bar). The direction of transcription of each gene in this region is depicted by the arrowhead. The two potential LexA boxes are ‘overlined’, and a hypothetical ribosome binding site (RBS) is underlined.

B. DNA fragments derived from the ydjM promoter in which either the top or the bottom strands were radioactively end-labelled were incubated with increasing concentrations of purified LexA and the complexes footprinted with hydroxyl radicals as described in Experimental procedures. The cleavage patterns obtained with end-labelled DNA incubated in the presence (lanes 1 and 2) or absence (lane 3) of LexA are shown for experiments in which either the top strand or the bottom strand was labelled, as indicated. Binding reactions contained either 0.1 (lane 2) or 0.2 ng (lane 1) of LexA protein. G-specific cleavage reactions (G-rxn, lanes 4 and 5, top and bottom strand, respectively) were performed with each labelled DNA fragment and were included as markers. DNA not subjected to hydroxyl radical cleavage was also included as a control (Con, lane 4, bottom strand). The location of the predicted LexA binding sites (boxes 1 and 2) on each strand is indicated, while the arrows indicate the centre of each site.

C. Densitometer scans showing the hydroxyl radical cleavage pattern of naked ydjM DNA (thin line) and ydjM DNA bound with LexA (thick line) [from (A) lanes 2 and 3 respectively]. The location and sequence of the predicted LexA binding sites (box 1 and box 2) are indicated above the scans, while the region of protection afforded by each LexA dimer bound to each box is indicated by the hatched bar below the scan.


Strategy for identifying genes in the LexA regulon

Studies of the damage-inducible SOS response have demonstrated that a major component of this response is regulated by the RecA and LexA proteins. The goal of this study was to identify additional E. coli genes that are regulated by LexA at the transcriptional level. This study has been aided by the fact that (i) the entire E. coli genome has been sequenced (Blattner et al., 1997) and (ii) the consensus binding site for LexA provides a motif that can be used for computational searches. However, a major problem lies in determining whether the binding site is appropriately positioned for LexA to inhibit transcription. Many of the genes identified in the E. coli genome have no known function and are only hypothetical ORFs (Blattner et al., 1997; Rudd, 1998). As a consequence, we did not eliminate potential LexA binding sites from our analysis even if they were located in the coding region of hypothetical genes. We found that no single computation search program was 100% accurate and, as a consequence, we have used the data obtained from three different search parameters and programs.

Our approach identified seven novel LexA-regulated genes as well as three that were previously predicted to be LexA regulated (Lewis et al., 1994). Many of these genes have unknown functions but, based upon database searches and comparisons with genes from other organisms, we hypothesize as to their activities.

ysdAB. A case in point about the uncertainty of the precise number of genes in the E. coli genome is that of the ysdAB operon The operon was identified based upon searches for ORFs in the 928 bp DNA fragment between the ilbL and emrD genes (Fig. 6C). From the sequence of MG1655 (Blattner et al., 1997), this region might encode two small genes encoding 37 and 29 amino acids respectively. However, based upon a comparative search of other databases (, it has been suggested that, within this region, MG1655 has a frameshift mutation or that the original sequence is erroneous and predicts that, instead of two genes, the region contains a single gene. We have, however, independently resequenced this region from MG1655 and an unrelated E. coli K-12 strain and confirm the original sequence (Blattner et al., 1997). ysdAB is located at 83 min on the E. coli map and is upstream of ilbL, but is transcribed in the opposite orientation. Our Northern analysis suggests that only ysdAB, and not ilbL, is regulated by LexA. Searches using the blast program (Altschul et al., 1990) revealed no significant homology between YsdA and YsdB and known proteins.

Figure 6.

Sequence context of dinS, dinQ, ysdAB and molR. The complete DNA sequence of dinS (A), dinQ (B), ysdAB (C) and molR (D) genes and their deduced amino acid composition is shown. Genes are positioned in the region they occupy in the E. coli chromosome (black bar), and their direction of transcription is indicated by arrowheads. Hypothetical ribosome binding sites (RBS) are underlined, whereas LexA boxes are either underlined or overlined. The sequence of the digoxigenin-labelled oligonucleotide used in the Northern experiments (Fig. 1) is indicated in bold typeface. Putative −35 and −10 promoter elements are ‘overlined’ in (A) and (C).

dinQ. This gene is located at 78.58 min on the E. coli map and is in the 823 bp intergenic region between gor and arsR (Fig. 6B). As the gene has previously been unassigned a name, we have called it dinQ (the next available listing in the din gene designation). Northern analysis of this region gave a transcript of ≈ 400 bp, but the predicted ORF suggests that dinQ would only encode a basic protein of 49 amino acids (5.4 kDa; pI 12.37). No significant homologies to known proteins were found with the dinQ sequence.

dinS. This gene is another example of the ambiguities in predicting ORFs. The 121-amino-acid protein predicted to be encoded by this din gene shows high homology with transposases. However, its 121-amino-acid ORF, which we designate dinS, is contained in a bigger ORF of 283 amino acids (gi1789981) (called ORF 2), which has been hypothesized to be the putative transposase of the IS150 element (Fig. 6A). This insertion sequence is only represented once in the E. coli genome (Blattner et al., 1997). At the present time, we have not been able to determine which of the two predicted ORFs (ORF2 or dinS) is, in fact, the transposase. However, it should be noted that transposases of similar size to DinS have been identified (see accession numbers Z67739 and M29945).

yjiW. The yjiW gene is expressed from a ≈ 500 bp mRNA that results in a protein of 128 amino acids (molecular mass 14.4 kDa; pI 9.7). The LexA binding site was originally identified as sosC by Lewis et al. (1994), and we have demonstrated here that the gene is, in fact, LexA regulated. blast searches revealed no significant protein homologies. The only hint to any function is its map location (98.66) in the mcrB–hsdS intergenic region. This ≈14 kb region encodes proteins involved in the host restriction–modification system (Dila et al., 1990). As pointed out by Lewis et al. (1994), a well-known facet of the SOS response is the temporary alleviation of the host EcoK restriction system (reviewed by Little and Mount, 1982), and it is possible that yjiW plays an important role in this response.

ydjM. Perhaps one of more interesting genes to be identified in this search was ydjM (also known as b1728; accession number AAC74798). Our studies revealed that its promoter region contained two functional LexA binding sites (Figs. 4 and 5A), and it is only the third E. coli LexA-regulated gene in this class. Footprinting analysis confirmed high-affinity binding of LexA to both predicted sites (Fig. 5B and C) in a manner that suggests that direct contacts occur between repressor dimers and helps to confer positive co-operativity for loading the sites (Schnarr and Granger-Schnarr, 1993). ydjM is located at 38.98 min on the E. coli map in a region between yniC and ydjN and is transcribed in the same direction as both genes. Although the ydjN gene is immediately downstream of ydjM and could, in theory, be expressed as part of an operon, our Northern analysis reveals a transcript of 1 kb (Fig. 1), which is not large enough to encode both ydjM and ydjN. The YdjM protein is composed of 196 amino acids and has a predicted molecular mass of 22.6 kDa and a pI of 9.1. The function of ydjM remains to be elucidated, but blast searches reveal significant alignments with similar hypothetical proteins in Bacillus subtilis (yvsG), Clostridium perfringens and Archaeoglobus fulgidus (accession numbers CAA11717, BAA81691and AAB90056 respectively).

hokE. The hokE gene (also known as ydbY) encodes a small protein of 83 amino acids (9.2 kDa; pI 8.5). Analysis revealed that it shows strong homology with the hok family of proteins (host killing) (Gerdes et al., 1997; Pedersen and Gerdes, 1999). These proteins are mainly found on various low-copy-number plasmids and consist of a dual-component system [toxin and antidote (Gerdes et al., 1997) that assure stable maintenance of the plasmid by killing host cells that have lost the plasmid during cell division (Pedersen and Gerdes, 1999)]. HokE shares highest homology with SrnB, which is also involved in the degradation of stable RNA (Gerdes et al., 1997). It seems unlikely, however, that the hokE gene is in fact functional, as an IS186 sequence is located 21 bp downstream of hokE, and the insertion interrupts the stable toxin-encoding hokE mRNA (Pedersen and Gerdes, 1999).

ybfE. Of all the genes regulated by LexA, ybfE appears to have the highest HI value (14.07). The ybfE gene is located at 15.32 min, immediately downstream of ybfF and is predicted to encode a 120-amino-acid protein with a molecular mass of 11.1 kDa and a pI of 10.6. blast searches did not identify any homologous proteins within the database, so the function of ybfE is presently unknown.

ydjQ. Like yjiW, the LexA binding site of ydjQ was also identified by Lewis et al. (1994), who called it sosD. The ydjQ gene is predicted to encode a protein of 295 amino acids (molecular mass 33.6 kDa; pI 8.8), which shares significant homology with UvrC. Interestingly, uvrC is the only gene of the UvrABCD excision repair complex that is not apparently regulated by LexA, and it is possible that ydjQ is an ancestral homologue of uvrC that retains its LexA regulation. The UvrC protein is 610 amino acids long, while, as noted above, YdjQ is roughly half that size (295 amino acids). However, database searches also revealed related proteins of similar size in Mycoplasma pneumoniae (accession number AAB01922), suggesting that the ydjQ gene may be functional. Clearly, further genetic and biochemical studies on ydjQ are necessary before we can conclude that YdjQ is an active homologue of UvrC.

molR. Yet another uncertainty is that of the molR gene (Fig. 6D). This gene is predicted to encode a 274-amino-acid protein (30.6 kDa) with a pI of 6.08, which is believed to act as a regulator of molybdate synthesis (Lee et al., 1990). It was also previously hypothesized as being LexA regulated and was called sosF (Lewis et al., 1994). Recent reanalysis of this genomic region (Rudd, 1998) suggests that the molR gene is, in fact, contained in a much larger ORF called yehH (Fig. 6D). We note, however, that the mRNA transcript obtained in this study (≈1 kb) (Fig. 1) matches more closely that of the predicted size of MolR and is too small to encode the larger YehH protein.

How many damage-inducible SOS genes are there in E. coli

Our investigations, combined with those of many early studies, suggest that there are likely to be 31 genes in the E. coli genome that are negatively regulated by LexA at the transcriptional level. If we assume that the total number of genes in the entire genome is ≈ 4300, this represents ≈ 0.7% of the genome. Many of the genes identified in this study encode proteins of no known function, so their biochemical function awaits further characterization. The number of damage-inducible genes that are not directly regulated by LexA is also likely to be considerable. In an attempt to study this phenomena, Lesca et al. (1991) observed the induction of at least 22 proteins in UV-irradiated lexA(Def) cells, and it is likely, with the recent development of DNA array technology allowing studies on damage-inducible expression of genes, that the number of genes in the global SOS regulon will grow significantly.

Experimental procedures

Bacterial strains and plasmids

Three E. coli K-12 strains were used in this study to determine whether a particular gene is LexA regulated. The wild-type lexA+ strain was RW118 (full genotype: thr-1 araD139Δ(gpt-proA)62 lacY1 tsx-33 supE44 galK2 hisG4 rpsL31 xyl-5 mtl-1 argE3 thi-1 sulA211). Strains RW434 and RW542 are isogenic to RW118, except that they carry the lexA3(Ind) and lexA51(Def) mutations respectively.

[The lexA51(Def) allele was previously isolated as an intragenic suppressor of the lexA3(Ind) allele (Mount, 1977). However, the second-site mutation has never been mapped. We have determined the sequence of the lexA51 allele and find that, in addition to the G85D mutation, lexA51 has a deletion of an A residue in a run of six As between nucleotides 475 and 480 of the lexA gene. This causes a frameshift in LexA at amino acid 160 and completely changes the amino acid sequence of the C-terminal domain, which is required for dimerization of LexA on its operator sequences (Kim and Little, 1992).]

pJWL288 (Roland et al., 1992), a plasmid that expresses E. coli LexA protein from a T7 promoter, was kindly provided by John W. Little (University of Arizona) and used to overproduce LexA protein. LexA was subsequently purified to more than 95% purity by following standard protocols (Little et al., 1994).

Sequence of oligonucleotide primers

The primers used for Northern analysis and PCR amplification were designed using MacVector, version 6.0 (Oxford Molecular) and are listed in Table 3. These primers were also analysed with the blastn program (Altschul et al., 1990) to reduce the possibility of obtaining false-positive signals during Northern analysis. The Northern primers were ≈ 25–30 nucleotides long and were either 5′ or 3′ end-labelled with digoxigenin (DIG). Table 3 includes only primers relevant to the work published in this study, but the sequence of the other primers used for the unpublished data (i.e. the ≈ 40 genes that we analysed and that appeared not to be LexA regulated) are available upon request.

Computer analysis

In order to identify potentially novel LexA-regulated genes, we performed computational searches of the entire E. coli genome using three different programs and slightly different search parameters. The first used a program developed by one of us (T.O.). This program (available upon request to counts the HI value (Lewis et al., 1994) for every 20 nucleotide sequence in the E. coli genome from position 1 to the end. The strandedness does not matter because the parameters themselves have a symmetrical feature. Candidate sequences with a low HI value (<14) were analysed further for their location near (−200 to +40 nucleotides) to a putative initiation codon.

The second approach used the program macvector 6.0 (Oxford Molecular) to search the E. coli genome for a 16 nucleotide motif, CTGNNT(N)7CAG, which was derived from the consensus LexA binding site sequence shown in Table 1. This simple analysis resulted in 933 potential sites. This number was subsequently reduced on the basis of several premises. First, potential LexA boxes should have less than five G or C residues in the general (N)10 region (as all of the previously characterized LexA boxes only have three or less guanine or cytosine residues within this region). Secondly, the HI value should be < 15. Thirdly, the potential LexA box should not be located inside a gene with a known function. However, we did not exclude any motif that was located inside one of the more than ≈ 1600 hypothetical genes reported in the E. coli genome (Blattner et al., 1997; Rudd, 1998). Finally, the region surrounding each potential LexA box was analysed for the presence of potential ORFs immediately upstream or downstream of each analysed sequence.

The third search of the E. coli genome was made by the colibri program provided by the Pasteur Institute at www. Using its search pattern function without filters, we looked for the following DNA motifs: CTGDNTDNNHNNHCAG, TTGDNTDNNHNNHCAG and CTGDNTDNNHNNHCAA. The two last motifs are derivatives of the first one, but introduce the C→T change observed in the dinG operator region (Lewis and Mount, 1992).

Northern analysis

E. coli strains RW118 (lexA+), RW434 (lexA3[Ind]) and RW542 (lexA51[Def]) were grown overnight in Luria media at 37°C. Two exponentially growing cultures for each strain were obtained by diluting the overnight culture 1:100 in fresh media and growing at 37°C until they reached an OD600 of 0.5. One of the two cultures was then treated with mitomycin C (Sigma) (5 μg ml−1 final concentration) for 30 min. Total RNA from both the untreated and the mitomycin C-exposed cells was obtained using the RNeasy kit (Qiagen). Approximately 25–30 µg of RNA was loaded on a denaturing agarose gel containing 1.2% formaldehyde. Transfer of the total RNA to a nylon membrane was achieved using the TurboBlotter system (Schleicher & Schuell) for 3 h. The nylon supports were prehybridized for 2 h at 42°C with EasyDIG (Boehringer Mannheim) and hybridized overnight with a DIG-labelled oligo. Then, the membrane was washed twice for 5 min with 2 × SSC containing 0.1% SDS at room temperature and twice more for 30 min at 55–60°C with 0.5∞ SSC containing 0.1% SDS. Hybridized bands were visualized according to the manufacturer’s suggestions (Boehringer Mannheim).

Gel mobility shift assays

Assays were generally performed as described previously (Fernández de Henestrosa et al., 1998), but with some slight modification. DNA probes were prepared by PCR amplification using the primers described in Table 3. (Note that only one of the two primers is 5′-labelled with DIG). PCR products were subsequently purified from a 2% low-melting-point agarose gel. LexA protein (0 nM, 2.5 nM, 5 nM, 10 nM, 25 nM, 50 nM and 100 nM) was incubated in a 5 µl final volume with each of the probes (0.5–1.0 ng) in binding buffer containing 10 mM N-2-hydroxyethyl-piperazine-N′ 2-ethanesulphonic acid (HEPES)-NaOH (pH 8), 10 mM Tris-HCl (pH 8), 5% glycerol, 50 mM NaCl, 1 mM EDTA, 1 mM DTT, 1 µg of pBluescript (Stratagene) and 50 µg ml−1 BSA. After 30 min incubation at 30°C, the mixture was loaded onto a 6% non-denaturing polyacrylamide gel prerun for 30 min at 100 V. The buffer used for electrophoresis was 25 mM Tris-HCl (pH 8), 250 mM glycine and 1 mM EDTA. DNA–protein complexes were separated by running the gel at 100 V for 2.5 h, after which time they were transferred to a negatively charged membrane, and the mobility of the probe was visualized according to the manufacturer’s recommendations (Boehringer Mannheim).

Labelled DNA fragments containing ydjM promoter sequences for hydroxyl radical footprint analysis

Primers complementary to sequences bracketing the putative LexA sites in the ydjM promoter were designed. Approximately 1.5 µg of the upstream primer ydjMUS (5′-TCAACTGACGGATTAATCGTC-3′) and the downstream primer ydjMDS (5′-CCGCACAAGCAATAGAAAAG-3′) were 5′ radiolabelled in separate reactions with T4 polynucleotide kinase (T4 PNK) and 100 µCi of [γ-32P]-dATP (6000 Ci mmol−1) for 30 min at 37°C. Each labelled primer was used in independent PCR reactions containing 1.5 µg of the complementary unlabelled primer, 150 ng of E. coli genomic DNA, 10 mM dNTPs and 5 units of Taq DNA polymerase (Gibco/Life Technologies). PCR DNA was precipitated and separated on a 1.8% agarose gel containing 1× TBE. After separation, the 154 bp radioactively end-labelled product was revealed by ethidium bromide staining. The labelled fragment was electroeluted from the gel slice, and the DNA was precipitated and dissolved in 200 µl of TE buffer (10 mM Tris-Cl, pH 8.0, 1 mM EDTA) and stored at 4°C until needed.

Hydroxyl radical footprint analysis

Labelled ydjM DNA fragment (60 ng; labelled on either end) was mixed with increasing amounts of LexA protein in a total of 28 µl containing 1 × binding buffer (10 mM Tris-Cl, pH 8.0, 50 mM NaCl, 1 mM DTT, 50 µg ml−1 BSA) at room temperature for 30 min. Four microlitres of 1 mM Fe(II)/EDTA (100 µM final) and 4 µl of 20 mM sodium ascorbate (2 mM final) were pipetted onto the side of the tube. The hydroxyl radical cleavage reaction was initiated by adding 4 µl of a 0.12% hydrogen peroxide solution to the drop, and the drop was rapidly mixed with the binding reaction. The cleavage reaction was incubated at room temperature for 2 min and then stopped by adding 15 µl of stop solution (400 mM thiourea, 100 mM EDTA, 1.2 M sodium acetate, 0.8% SDS). The DNA was purified by ethanol precipitation and resuspended in formamide/loading solution, denatured at 90°C for 2 min, and cleavage products were separated on 6% sequencing gels. Gels were dried and analysed on a Molecular Dynamics PhosphorImager.


We thank John Little for the E. coli LexA-overproducing plasmid, pJWL288 and for helpful discussions. This work was supported in part by the NIH intramural research programme, by NIH/NIGMS grant GM52426 to J. J. Hayes, and by a grant from the Ministry of Education, Science, Sports and Culture of Japan (no. 08280104) to H. Ohmori.