The complete sequence of the locus of enterocyte effacement (LEE) from enteropathogenic Escherichia coli E2348/69


James B. Kaper E-mail; Tel. (410) 706 5328; Fax (410) 706 6205.

Enteropathogenic Escherichia coli (EPEC) are an important aetiological agent in infant diarrhoea and the prototype for a family of pathogens exhibiting the unique virulence mechanism known as attaching and effacing (AE) (Nataro and Kaper, 1998). All genes necessary for AE are encoded on a 35 kb chromosomal pathogenicity island called the locus of enterocyte effacement (LEE), which contains genes encoding a type III secretion system, secreted proteins (Esp) and the adhesin intimin (McDaniel et al., 1995; McDaniel and Kaper, 1997). Study of the LEE will illuminate our understanding of the pathogenesis of EPEC and other AE pathogens and contribute to the growing body of knowledge about type III secretion systems and pathogenicity islands. We have recently sequenced the entire LEE of EPEC strain E2348/69 and describe below our initial analysis. Further details can be found in GenBank (accession number AF022236) and on the Molecular Microbiology Web site (

The complete region was 35 624 bp with an average G + C content of 38.36%, which is far below that of the E. coli chromosome (50.8%; Blattner et al., 1997), a pattern in keeping with many other pathogenicity islands (Hacker et al., 1997). The LEE contains 41 predicted open reading frames (ORFs) (of > 50 amino acids) arranged in at least five polycistronic operons, as predicted by the close spacing of co-directional genes. The LEE may be divided into at least three functional domains (Fig. 1): the central eae (encoding intimin), the region encoding the secreted Esp proteins and a large region encoding the type III secretion apparatus.

Figure 1.

The locus of enterocyte effacement (LEE) of E2348/69.

Several LEE genes have been reported previously, and our final LEE sequence entry contains corrections to some of these previously reported genes and predicted proteins. Additionally, we have decided to adopt a standardized nomenclature (Bogdanove et al., 1996a; Yahr et al., 1997), which changes the name of several previously described genes comprising the type III secretion system of EPEC (Jarvis et al., 1995). Those genes homologous to Yersinia type III secretion (ysc) genes are referred to as esc (E. coli secretion) genes with the same suffix as the Yersinia homologue (e.g. sepA becomes escV, homologous with yscV; Table 1). Within the family of type III secretory genes, the LEE shares the highest level of predicted amino acid similarity and genetic organization with ssa genes from the SPI-2 pathogenicity island of Salmonella typhimurium (Shea et al., 1996). Genes that are not ysc homologues but are involved in type III secretion are named sep (secretion of E. coli proteins). The chaperone for the secretion of EspD is named cesD for chaperone for E. coli secreted protein D (Wainwright and Kaper, 1998). The remaining named genes, esp (E. coli secreted protein), eae (E. coli attaching and effacing) and orfU will retain their designations, and remaining ORFs are designated orf or rorf depending on the direction of transcription relative to eae.

Table 1. . Open reading frames of the LEE with similar protein sequences.Thumbnail image of
  • a

    Identity and similarity calculated by gapped BLAST without filter, using the algorithm of Altschul et al. (1997), available at http://www.ncbi.nim.nih. gov/cgi-bin/BLAST/nph-newblast.–> Polycistronic operon: predicted by close promixity/overlapping genes. Direction of transcription indicated.– > Possible polycistronic operon: genes are co-directional but sufficiently spaced to allow independent transcription of downstream genes.* Homologous Orf/protein from the E. coli K-12 genome sequence (Blattner et al., 1997).Comment/probable function: phenotype in bold determined experimentally; phenotype in normal text hypothesized on the basis of homology; italicized phenotypes are proposed on the basis of weak homology or position in the LEE. See text for other details.

  • A brief description of selected LEE ORFs follows. More details can be found in Table 1, Fig. 1 and on the Molecular Microbiology home page (http://www.blackwell-science. com/products/journals/mole.htm.

    rOrf1 is similar to a protein of unknown function from E. coli K-12 and to a predicted lipoprotein that is encoded on the S. typhimurium virulence plasmid adjacent to rck (Heffernan et al., 1992), which has been shown to be important for virulence (Cirillo et al., 1996).

    rOrf2 is similar to the VirA protein of Shigella flexneri, a type III secreted protein that is involved in invasion and intercellular spreading (Uchiya et al., 1995). Secretion of rOrf2 has not been observed, and it is unclear what functions rOrf2 may have in EPEC, in which the role of invasion remains undefined.

    The protein Orf1 appears to be an HNS-like regulatory protein: the first reported from a pathogenicity island. Such proteins regulate expression by binding to AT-rich regions (Dersch et al., 1993; Zuber et al., 1994) and so may be expected to interact extensively with sequences within the AT-rich LEE.

    rorf3 encodes a protein similar to a family of proteins of unknown function from F-, R- and virulence-associated plasmids of Shigella and Salmonella (Graus-Goldner et al., 1990; Allaoui et al., 1993; Miras et al., 1995).

    Orf19 is similar to IpgB of Shigella, a protein of unknown function encoded on the virulence plasmid (Baudry et al., 1988).

    Orf20 has been reported recently as Tir, the translocated intimin receptor (Kenny et al., 1997), which is translocated into the cytoplasmic membrane of the eukaryotic cell, where it is phosphorylated [becoming the protein described as ‘Hp90’ by Rosenshine et al. (1992)] and is then able to bind to intimin. Tir may also nucleate actin, possibly acting as a bridge between the bacteria and the host cytoskeleton. Tir has some similarity to HrpN, a type III secreted virulence-associated protein of Erwinia (Wei et al., 1992; Bogdanove et al., 1996b).

    orfU is highly conserved in all AE pathogens examined so far (L32312, L11691 and U59503), possibly because orfU contains within its ORF the transcriptional start site of eae (Gómez-Duarte and Kaper, 1995). The similarity of OrfU to SycH and its location suggest that it may function as a chaperone for Tir, as supported by preliminary data from our laboratory.

    The LEE also contains interesting non-coding genetic elements. The remnant transposase gene at the extreme right end of the LEE suggests a potential mechanism for introduction of the LEE into the chromosome (Donnenberg et al., 1997). The large enterobacterial repeat intergenic consensus (ERIC) element has no known function but may influence gene regulation (Hulton et al., 1991). Finally, the E. coli chromosomal region flanking the LEE contains several large deletions (McDaniel et al., 1995), which we have sequenced (accession numbers AF031371 and AF031372).

    In summary, in addition to encoding a functional type III secretion system, secreted proteins and an adhesin, the LEE also contains previously undescribed genes. These appear to encode novel proteins involved in the type III secretory pathway, new secreted proteins, chaperones and a regulator/repressor. In addition, there are genes of unknown or cryptic function, which suggests that the LEE may encode other functions in addition to the AE phenotype.

    We are currently analysing the LEE in greater detail, including expression of orfs to confirm the predicted size of the translated products, primer extension to identify actual transcriptional start sites and promoters, Northern blots to demonstrate operon structure and mutagenesis of each gene and orf to determine its role in EPEC pathogenesis. The results of these studies will further the ultimate goal of understanding the LEE's function in EPEC pathogenesis.


    The authors are grateful for the advice and assistance of George Mayhew and Nicole Perna of the University of Wisconsin, and Nick Ambulos and Lisa Sadzewicz of The University of Maryland at Baltimore Biopolymer Laboratory for sequencing and analysis. This work was supported by NIH grant AI21657 (J.B.K.) and AI32074 (M.S.D.).