A variant enterococcal surface protein Espfm in Enterococcus faecium; distribution among food, commensal, medical, and environmental isolates


  • Tracy J Eaton,

    Corresponding author
    1. Institute of Food Research, Norwich Research Park, Colney, Norwich NR4 7UA, UK
      *Corresponding author. Tel.: +44 (1603) 255 180; Fax: +44 (1603) 507 723, E-mail address: tracy.eaton@bbsrc.ac.uk
    Search for more papers by this author
  • Michael J Gasson

    1. Institute of Food Research, Norwich Research Park, Colney, Norwich NR4 7UA, UK
    Search for more papers by this author

*Corresponding author. Tel.: +44 (1603) 255 180; Fax: +44 (1603) 507 723, E-mail address: tracy.eaton@bbsrc.ac.uk


Enterococci are increasingly important causes of nosocomial disease. Also, they are associated with food and have a history of use as dairy starter and probiotic cultures. An enterococcal surface protein Espfs is involved the virulence and biofilm-forming capacity of Enterococcus faecalis and recently we demonstrated the presence of a homologue Espfm in E. faecium. Here we describe the complete structure of Espfm and demonstrate that its distribution in E. faecium correlates with disease associated strains from a range of pathological sites.


Enterococci are recognised as important human pathogens that are responsible for serious nosocomial infections, causing an increasing incidence of endocarditis, bacteremia, urinary tract and neonatal infections[1]. Enterococcus faecalis remain the dominant causative agent in enterococcal bacteremia but more recently, there has been an increase in the number of Enterococcus faecium infections most likely due to the emergence of vancomycin-resistant E. faecium[2]. In addition, enterococci occur as natural food contaminants and are used as probiotic cultures and starters in dairy products[3]. In view of the clinical situation, the use of E. faecium and E. faecalis strains in food fermentations and as probiotics demands careful safety evaluation [3,4].

A novel enterococcal surface protein (Esp) was identified in an E. faecalis strain that caused multiple infections within a hospital ward[5]. Esp is highly associated with the ability to form a biofilm at abiotic surfaces[6]. Biofilms have increased antimicrobial and antibiotic resistance and are important in infections involving catheters and in the dairy industry where enterococci adhere to stainless steel and may contaminate equipment [7–9]. The presence of Esp contributes to colonisation and persistence in the urinary tract in animal models[10] and is associated with colonisation and spread among liver transplant patients[11]. Esp has recently been shown to be part of a pathogenicity island in E. faecalis[12] and an esp-like gene was found in E. faecium medical strains [3,13].

The purpose of this study was to characterise the esp gene from the E. faecium medical strain, P61, and investigate its presence in E. faecium isolates from different sources (medical, food, commensal and environmental), different infection sites and different geographical regions (American and European). We developed chromosomal sequencing methodology and used polymerase chain reaction (PCR) combined with restriction digestion and sequencing, to screen for variations in esp structure in esp-positive strains.

2Materials and methods

2.1Bacterial strains, plasmids and media

E. faecium P61, a medical isolate from a blood infection, served as the prototype for sequencing of the esp structural gene. E. faecium strains were routinely cultured in brain heart infusion (Oxoid) statically at 37°C. Luria Broth[14] was used for growth of Escherichia coli at 37°C, supplemented with ampicillin at 100 μg ml−1, X-Gal at 40 μg ml−1 and IPTG at 100 μg ml−1 where appropriate. E. coli JM109 was obtained from Promega. Plasmids used in this study are shown in Table 1.

Table 1.  Plasmids used in this study
pFI22881189-bp fragment generated using TE58/59 primers cloned into pGEM-T; A repeat region
pFI2296384-bp fragment generated using TE56/57 primers cloned into pGEM-T; C7′B3 repeat region
pFI2347deletion clone of pFI2296; 513-bp internal Eco RV fragments deleted; A1:A4 repeat region
pFI2348685-bp fragment generated using TE39/76 primers cloned into pGEM-T; N-terminal region
pFI2370955-bp fragment generated using TE34/36 primers cloned into pGEM-T; N-terminal region
pFI23742284-bp fragment generated using TE98/57 primers cloned into pGEM-T; C repeat region
pFI23752792-bp fragment generated using TE98/89 primers cloned into pGEM-T; C repeat region

2.2Enterococcal isolates

E. faecium isolates were obtained from National collections, various companies and educational establishments; including the National Collections of Industrial and Marine Bacteria; the BCCM™; the Danish Veterinary Laboratory; the Federal Research Institute for Nutrition, Karlsruhe; the Complutense University of Madrid, the University of Minnesota; the US Food and Drug Administration, Maryland and the Public Health Laboratories, Collindale. Seventy four strains were tested, including 21 food strains (six starters from milk products or cheese, four starters from Spanish fermented sausages, two isolates from retail broiler, three from retail pork and six from slaughterhouse broilers), 43 medical isolates (one from pus, one from perineum, five from wound swab, five from faeces, five from urine, five from blood, five from blood-endocarditis infections, five from environment and 10 isolates from infection outbreaks in American hospitals) and nine commensal strains (six from faeces of healthy adults and three from faeces of healthy breastfed newborns). All strains were of European origin except where stated.

2.3DNA manipulation

DNA manipulations were carried out using standard methods[15]. Plasmid DNA was purified using QIAprep® Spin Miniprep Kit (Qiagen). E. faecium chromosomal DNA was prepared using QIAGEN® Genomic DNA Kit as described by the manufacturer, with the modification of adding 100 U ml−1 of mutanolysin to the sample during the lysis stage. Oligonucleotides were purchased from GenoSys (Table 2). PCR amplifications were performed in 25-μl reaction mixtures using 10 ng DNA or 1-μl cell suspension (one large colony in 100 μl MilliQ water) and 1 U AmpliTaq® DNA Polymerase (Amersham). Samples were subjected to one cycle of 96°C for 2 min, followed by 30 cycles of 96°C for 40 s, 40 s annealing at the appropriate temperature and 40 s extension at 72°C. PCR products were purified using QIAquick® PCR Purification Kit before cloning into pGEM®-T vector using the pGEM®-T Vector System II (Promega).

Table 2.  Oligonucleotide primers used for PCR amplification
  1. aNucleotide sequence positions refer to the esp sequence deposited in the EMBL database under accession number AJ487981. Figures in brackets indicate original primer names from Shankar et al.[10] and nucleotides in bold indicate differences from the E. faecium esp sequence. a Refers to single-stranded sequence nucleotide positions.

  2. bRefers to single-stranded sequence nucleotide positions.

NameSequence (5′–3′)StrandPosition (nt)a

Plasmid and PCR product sequencing was carried out using a ABI PRISM® Big Dye™ Terminator Cycle Sequencing Ready Reaction Kit and a ABI 373A automated sequencer, as directed by the manufacturer. Chromosomal sequencing was carried out using the following modifications: 4 μg of DNA was heated to 65°C before adding to sequencing reaction mixture and subjected to one cycle at 96°C, followed by 65 cycles of 94°C for 30 s, 50°C for 20 s and 60°C for 4 min (ramp 1.0). Products were purified using Centri-Sep Columns (Princeton Separations) as recommended by the manufacturer except that columns were hydrated for at least 2 h at room temperature prior to use.

2.4Cloning regions of esp and nucleotide sequencing strategy

The highly repetitive structure of the esp gene caused considerable difficulty during the cloning and sequencing work. A combination of approaches was used to sequence the whole gene. Initial detection of esp was made by Southern hybridisation and PCR amplification using primers TE34/36. The amplified 955 bp fragment was cloned into pGEM®T-Easy to generate pFI2370; Fig. 1[3]. This clone encodes a portion of the Esp N-terminus. The A-repeat region was amplified using TE58/59 (pFI2288). In order to sequence both strands of this region, pFI2288 was digested with Eco RV, which cuts the DNA at the junction of the first and second A repeat and the third and fourth A repeat, to generate pFI2347. This plasmid contains only the first and fourth A repeats. TE56/57 were used to amplify a 384-bp fragment which corresponds to the C7′B3 region (pFI2296; Fig. 1). Plasmid pFI2348 was generated using TE76/39 to cover the reverse complementary strand immediately upstream of the A-repeat region, including the TE58 primer binding site.

Figure 1.

Schematic diagram of the predicted E. faecium Esp protein structure. A: Deduced Espfm protein showing the predicted signal (S), N-terminal (N), repeat (R) and C-terminal (C) regions. Numbers above each region indicate the number of amino acids in each region. The repeat blocks A, B, C and C′ (partial C repeat) are indicated by cross-hatched, solid black and large and small dotted boxes respectively. B: Plasmid clones constructed for sequencing.

The C-repeat region was amplified using primers TE98/57 and cloned to generate pFI2374. Nested deletions were generated for the forward direction by digestion with Nco I and Apa I, to generate a Nco I exonuclease III-sensitive end. For the reverse direction, pFI2374 was cut with Spe I and Sac I to produce a Spe I exonuclease III-sensitive end. Exonuclease III digestion was performed using the Erase-a-Base® System (Promega). Primers TE98/89 were used to generate a plasmid containing the C region and the flanking downstream region, incorporating a putative transcriptional terminator (pFI2375). Chromosomal sequencing, using a primer walking strategy, was used to sequence both upstream and downstream regions not covered by sequencing of the plasmid clones.

2.5Epidemiology of the esp gene and determination of repeat number variation

Three different primer pairs (Table 2) were used to detect the presence of esp. Primers TE34 and TE36 were used for amplifications within the N-terminal region. Primers TE58 and TE59 were used for amplifications across the A region. Primers TE98 and TE103 were used for amplifications across the C region. To determine arrangement and number of C and C′ repeats, PCR products generated using TE98 and TE103 were digested with Pvu II, a conserved site which cuts towards the end of the B-repeat region. Sequencing of the PCR products generated using TE98 and TE103 was used to confirm the arrangement (data not shown).

2.6Nucleotide sequence accession number

The DNA sequence reported in this paper has been deposited in the EMBL nucleotide sequence database under accession number AJ487981.

3Results and discussion

3.1Structural analysis of the E. faecium esp gene and deduced protein

The complete esp gene of E. faecium P61 was sequenced requiring an efficient enterococcal chromosomal sequencing protocol to be developed. An open reading frame (ORF) of 5718 bp encodes a 1975-amino acid residue protein with a predicted molecular mass of 212 kDa. Fig. 1 shows a schematic view of the E. faecium Esp protein structure. The predicted Esp protein (Espfm) is the first variant esp to be sequenced in its entirety and is currently the largest ORF identified in E. faecium. The sequence reported here shares 100% identity with partial sequence data previously reported for a 889-bp N-terminal fragment cloned from a variant esp gene identified in E. faecium[13]. No homology is found between the reported sequence and the recently available E. faecium genome sequence (The Doe Joint Genome Institute; http://www.jgi.doe.gov/ and The Human Genome Sequencing Centre; http://www.hgsc.bcm.tmc.edu/).

Analysis of the sequence reveals several significant features. The Espfm protein appears to have all of the features of a typical Gram-positive cell surface anchored protein (Fig. 1). Espfm appears to have a long signal sequence[16], followed by an N-terminal domain. The central portion of Espfm consists of three distinct tandemly repeating units: four nearly identical A repeats followed by a B repeat and six nearly identical C repeats followed by a partial C repeat (C′) and a B repeat. A second partial C repeat and third B repeat are located immediately downstream. The first partial C repeat is identical to the first 10 amino acids of the second to sixth C repeats. The second partial C repeat is identical to the first C repeat except for one amino acid substitution. These divergent repeats could be generated at the extremities of the repeat region by recombination or slipped-strand mispairing during DNA replication, resulting in expansions and contractions of the repeat, as described for the M6 protein of Streptococcus pyogenes[17].

The C-terminus region includes a membrane-spanning hydrophobic domain, a FPKTG(E) cell wall anchor motif and a charged tail found in the majority of Gram-positive bacteria [18,19]. This variant of the LPXTG(X) consensus where the conserved leucine at position 1 is replaced by tyrosine, is also found in the E. faecalis Esp protein (Espfs). These are the first instances of the replacement of a residue at position 1. Espfs was shown to be anchored by its carboxy-terminus and displayed at the cell surface using antibodies to a recombinant N-terminal region[5].

3.2Comparison of the esp gene from E. faecium and E. faecalis

The E. faecium esp gene (espfm) is similar in sequence and global organisation to the E. faecalis esp gene (espfs) and it is likely that Espfm has a similar function in E. faecium. However, a number of differences exist within the N- and C-termini and in the arrangement of the repeat region. Overall, both the genes and proteins share 89% identity. The signal and N-terminal regions share 71% and 93% identity at the amino acid level, respectively. The B repeats share 74–90% identity and the C repeat regions share 96% identity. Espfm is 102 amino acids larger than the prototype Esp protein, Espfs. An additional 12 amino acids, consisting of DEISPSSPLETA, are present immediately following the signal sequence, and the next 10 amino acids share no identity with the Espfs sequence. It is conceivable that the extreme N-terminal of the Espfm protein may participate in specific interactions with the host. The Espfm protein in E. faecium P61 has two tandem C′B repeat regions, whereas only one is present in Espfs. The large number of differences between the Espfm and Espfs the genes suggest that the E. faecium strain did not acquire this gene recently and that it has existed in that species for some time.

3.3Comparison of global organisation of Espfm, Rib, alpha C and Bap

The eubacterial genome databases were searched for sequence homologies to the Espfm protein using the gapped BLAST programme[20]. Espfm shows global structural similarity to S. agalactiae Rib[21] and the alpha C[22] proteins involved in lethality and immune evasion [23,24], the R28 protein of S. pyogenes which binds to human epithelial cells in vitro[25] and to the S. aureus biofilm-associated protein Bap[26] and Bap-like protein, BHP from S. epidermis (EMBL accession no. AY028618). Fig. 2A shows representative genes Rib, alpha C and Bap, and their sequence identity to Espfm.

Figure 2.

A: Comparison of the Espfm, Bap, Rib and C alpha proteins showing the structural similarity of the four proteins. The percentage identity of regions of high homology is indicated between the dotted lines. The Esp structure is derived from E. faecium strain P61 (this study), the Bap structure is derived from the S. aureus bovine strain V329[26], the Rib structure is derived from group B streptococcal strain BM110[21] and the C alpha structure is derived from group B streptococcal strain A909[22]. The N-terminal domain of Esp (residues 49–755) and region B of Bap (residues 361–819) are compared. The C-terminal of Esp (residues 1161–1662) and the repeat region of Rib (residues 175–1137) and C alpha (residues 186–935) are compared. The Rib and C alpha signal regions (residues 1–55 and 1–41 respectively) and N-terminal regions (56–174 and 42–185 respectively) are compared. The N-terminal of Bap and Espfm proteins share 31% identity. The C region of Espfm and alpha C and Rib share 38% and 40% overall identity, respectively. B: Alignment of the 13-amino acid highly conserved region within the repeats of Espfm, the Rib and C alpha proteins of Streptococcus agalactiae and the Bap protein of S. aureus. Identical amino acids are shown in boldface type.

Multiple alignments of the repeat regions of the alpha C and Rib proteins, the C repeat regions of the Bap and the B and C repeat regions of Espfm are shown in Fig. 2B. Conserved motifs consisting of (V/I)(E/V)VTYPDG(S/T)(K/S)(D/E)(T/E)V and D(A/K)DK(Y/N) occur near the beginnings and ends of the repeat for all the proteins except Bap. These regions may be important in the recombination events whereby the number of repeats is varied. The central PDG motif is common to all the genes, including Bap, suggesting a conserved function with limited permissible drift. Interestingly, a conserved sequence motif, consisting of GTTV, occurs in both the A and C repeats of Espfm and the repeat region of Rib. This motif, which extends over half of the protein, may also have a conserved function within the structure. The presence of the conserved GTTV motif and higher sequence identity of Espfm and Rib as compared to the alpha C protein indicate that Espfm is more closely related to the Rib than to the alpha C protein.

BLAST searches with the Espfm signal sequence showed considerable identity (>30% identity) with many Gram-positive adhesin protein signal sequences of genes such as the mucus binding protein of Lactobacillus reuteri (TrEMBL accession no. Q9RGN5), the fibrinogen binding proteins of Streptococcus equi[27] and Staphylococcus aureus[28] and the BHP (Bap-like protein) of S. epidermis (TrEMBL accession no. Q9AER7). Alignment of several of these signal sequences (data not shown) revealed extensive identity in the central and latter portions of the sequences with a highly conserved (S/A)I(R/K)K(Y/L) motif. Together with the considerable identity of the Espfm protein to the biofilm-associated Bap protein, this provides further support for a functional role of Espfm as an adhesin-type protein. Significant identity with the Rib and alpha C proteins suggests that Espfm is likely to have at least two functional roles, which have yet to be fully elucidated.

3.4Distribution of espfm among food, medical, commensal and environmental isolates of E. faecium

An increased frequency of espfm was seen among medical and environmental isolates as compared to commensal and food isolates (Table 3) thus providing indirect evidence that Espfm may contribute to virulence. The figures obtained here (68% of UK medical isolates) compare well with those of Woodford et al.[29] who reported that 63% of UK medical isolates tested were positive for esp. In the study of Willems et al.[13], 70% of vancomycin-resistant E. faecium isolates were esp positive but isolates from healthy individuals were negative. In Italy, where epidemic infections involving Enterococcus are not common, the incidence of Esp is significantly lower at 33%[30]. The study of Franz et al.[4] also indicated a very low percentage (2.1%) of E. faecium food isolates positive for esp. Interestingly, in E. faecalis, 10–40% of isolates from healthy individuals were positive for the esp gene [5,10,11], which may reflect the increased virulence of E. faecalis as compared to E. faecium generally, and the possible role of Espfs in the spread of epidemic strains.

Table 3.  Analysis of A and C repeat number variation in esp+E. faecium strains by PCR, restriction digestion and sequencing as described in Section 2
  1. All isolates having six C repeats, had two C′ and three B repeats as illustrated in Fig. 1; all isolates having five C repeats had only one C′ and two B repeats.

No.StrainOriginNo. A repeatsNo. C repeats
2P47Wound swab46
3P48Wound swab46
4P49Wound swab46
5P50Wound swab46
22cvm12086Clinical VRE (American)46
23cvm12088Clinical VRE (American)25
24H55Clinical VRE (American)55
25H70Clinical VRE (American)55
26H70aClinical VRE (American)55
27H70bClinical VRE (American)55
28H70cClinical VRE (American)55

In order to assess whether espfm-positive isolates occur predominantly at any particular infection site, different groups of isolates were analysed (Table 4). These results show a high incidence of espfm in urine, blood, endocarditis and wound swab isolates. The study of Woodford et al.[29] also indicated a similarly high occurrence in urine isolates (85%). Other studies reported that one third of E. faecalis urine isolates were positive for espfs, with values averaging 51% and 67% for blood and endocarditis isolates, respectively [5,12,31]. It seems likely that Espfm contributes to infection at all four sites. A high incidence in wound swab isolates may be associated with the ability of Espfm to form a surface biofilm.

Table 4.  Frequency of the esp gene among E. faecium isolates from different groups
  1. The presence of esp was assessed by PCR as described in Section 2.

  2. aMedical isolates only.

  3. bCategory includes pus, perineum and American medical isolates of unspecified origin. n indicates number of isolates in each group.

Groupesp+ (%)
I – All sources (n = 20)Food0
II – Isolate origina (n=38)Wound swab80
III – Location origina (n=38)European68

Translocation of E. faecalis across intact intestinal mucosa may be a route to infection[32]. Isolates positive for esp may adhere to or form a biofilm at intestinal mucosa and so persist. In this study, 80% of medical faecal isolates were positive for espfm whereas commensal isolates from healthy individuals were completely clear of the determinant. However, faecal isolates from patients with other enterococcal infections, e.g. bacteremia, were not available for analysis. Further work in this area could provide evidence for translocation as a route of infection. Faecal and environmental isolates positive for esp could act as a reservoir that may contribute to spread of infection.

A comparison of European and American strains (Table 4) shows no major difference the incidence of espfm. This contrasts with another study[5] that found none of 34 E. faecium medical isolates positive for esp.

3.5Repeat number variation

The number of A and C repeats in screened Espfm proteins was determined (Table 3). In general, the Espfm proteins tended to have more A repeats and less C repeats than Espfs proteins. The A repeats ranged from two to six, with the majority of isolates possessing five repeats (Table 4). The C repeats ranged from five to six, with the majority possessing six repeats. All isolates having six C repeats, had two C′ and three B repeats as depicted in Fig. 1; all isolates having five C repeats had one C′ and two B repeats. In E. faecalis, the number of A repeats vary from one to three, the number of C repeats vary from three to nine (with seven being the most common number) and one partial C repeat and two B repeats occur[5]. The significance of these differences is not clear. No correlation with the number of A and C repeats is seen with the source of isolate.


Espfm is highly conserved in infection-derived isolates and it is also found in environmental isolates, suggesting an association with virulence and possible spread of infection. Food and commensal isolates are clear of the virulence determinant. Espfm is distinct from the E. faecalis esp gene and E. faecium esp-positive strains have probably existed for some time. Species-specific primers for identification of esp in E. faecium and E. faecalis may now be designed for rapid screening of medical isolates. Further analysis of the functional role of Espfm in translocation is underway.


We thank Frank Aarestrup, Polly Kaufman, Gary Dunny, Shabbir Simjee, Juan Rodriguez and Manuel Nunez for supplying strains.