PSA is a temperate phage isolated from Listeria monocytogenes strain Scott A. We report its complete nucleotide sequence, which consists of a linear 37 618 bp DNA featuring invariable, 3′-protruding single stranded (cohesive) ends of 10 nucleotides. The physical characteristics were confirmed by partial denaturation mapping and electron microscopy of DNA molecules. Fifty-seven open reading frames were identified on the PSA genome, which are apparently organized into three major transcriptional units, in a life cycle-specific order. Functional assignments could be made to 33 gene products, including structural proteins, lysis components, DNA packaging proteins, lysogeny control functions and replication proteins. Bioinformatics demonstrated relatedness of PSA to phages infecting lactic acid bacteria and other low G + C Gram-positives, but revealed only few similarities to Listeria phage A118. Virion proteins were analysed by amino acid sequencing and mass spectrometry, which enabled identification of major capsid and tail proteins, a tape measure and a putative portal. These analyses also revealed an unusual form of translational frameshifting, which occurs during decoding of the mRNAs specifying the two major structural proteins. Frameshifting yields different length forms of Cps (gp5) and Tsh (gp10), featuring identical N-termini but different C-termini. Matrix-assisted laser-desorption ionization mass spectrometry (MALDI-MS) and electrospray ionization mass spectrometry (ESI-MS) of tryptic peptide fragments was used to identify the modified C-termini of the longer protein species, by demonstration of specific sequences resulting from + 1 programmed translational frameshifting. A slippery sequence with overlapping proline codons near the 3′ ends of both genes apparently redirects the ribosomes and initiates the recoding event. Two different cis-acting factors, a shifty stop and a pseudoknot, presumably stimulate frameshifting efficiency. PSA represents the first case of + 1 frameshifting among dsDNA phages, and appears to be the first example of a virus utilizing a 3′ pseudoknot to stimulate such an event.
Listeria monocytogenes is a Gram-positive, opportunistic pathogen that can be found ubiquitously. It is capable of causing life-threatening infections in humans and in animals, and is primarily transmitted via contaminated food. Recent outbreaks of Listeriosis (Slutsker and Schuchat, 1999; CDC, 2000; Siegman-Igra et al., 2002) again emphasised that more information on this organism is urgently needed, especially on some basic characteristics such as phenotypic variations and alterations. The potential influence of transmissible genetic elements on the phenotype of Listeria cells is not known. Lysogeny is widespread among Listeria strains; it was interesting to note that the five prophage-like elements within the L. innocua CLIP 11262 genome make up approximately 7% of the bacterial genome (Glaser et al., 2001). However, all but one of them seem to be non-functional (i.e. cryptic), because they did not produce infective particles upon induction (unpubl. obs).
Phages for Listeria monocytogenes are, with few exceptions, specific with respect to the serovar (SV) of the host cells. The major serovar groups (SV 1/2 and 4) can be differentiated by their somatic antigens, which are based upon composition and sugar substitution of cell wall-associated teichoic acids (Fiedler and Ruhland, 1987). These carbohydrates also serve as primary receptors for the serovar-specific phages (Wendlinger et al., 1996). Nucleotide sequence information has recently become available for two phages of L. monocytogenes SV 1/2. Whereas the φEGDe element in the genome of L. monocytogenes strain EGDe (Glaser et al., 2001) is cryptic (unpubl. obs.), and therefore not suitable for further analysis, phage A118 has been studied in more detail (Loessner et al., 2000). It is a temperate virus, featuring a terminally redundant and circularly permuted 40 834 bp dsDNA genome. During lysogenization, the virus inserts into a comK homologue (same locus as φEGDe), and probably destroys gene function. The possible implications on the Listeria phenotype have not yet been investigated. Altogether, there clearly is a need for detailed studies on the molecular biology of the many phages able to infect the different serovars and species of the genus Listeria.
No information was previously available on viruses infecting L. monocytogenes serovar group 4. We have focused our present study on the temperate phage PSA (phage of Scott A), which has been isolated by UV-induction from L. monocytogenes strain Scott A (Loessner et al., 1994a). This particular strain was implicated in a massive outbreak of human listeriosis, linked with the consumption of pasteurized milk (Fleming et al., 1985). PSA exclusively infects L. monocytogenes serovar 4 strains, which are frequently implicated in outbreaks of the disease. The virion features a long, flexible, non-contractile tail of 180 nm and an isometric capsid with an apex-to-apex diameter of 61 nm (Loessner et al., 1994a). It belongs to the Siphoviridae family of dsDNA bacterial viruses in the order Caudovirales (B1 morphotype), and was taxonomically classified into species 2389. To become a prophage, PSA integrates into attB located at the 3′-end of single-copy tRNAArg gene, where it reconstitutes function by phage nucleotides (Lauer et al., 2002).
We here report the entire nucleotide sequence of the PSA genome, and its physico-molecular characterization. Functional assignments yielded interesting information on enzymes, and morphopoetic and structural proteins. We also discovered an unusual form of ribosomal frameshifting, which is utilized for the synthesis of different length forms of the two major structural proteins. These cases represent the first examples of + 1 frameshifting among dsDNA phages, and appear to be stimulated by a 3′ pseudoknot and a shifty stop respectively.
Nucleotide sequence and characteristics of the PSA genome
The sequence of the entire PSA dsDNA genome was determined using a shotgun approach. Purified phage DNA was partially digested with a frequently cutting restriction endonuclease, fragments cloned into an E. coli vector and plasmid inserts sequenced. The resulting sequence data were initially aligned into eight contiguous stretches, representing approximately 47% of the total genome sequence. Gaps between these contigs were closed by a primer walking strategy, directly on the phage DNA. This was continued until defined sequence termini (‘run-offs’) were observed at both ends of the single last contig, representing the entire genome (Fig. 1A–D). The genome sequence was finalised after generating a PCR product spanning the cos-site core sequence on chromosomal DNA of the lysogenic host. A number of PSA restriction maps predicted in silico were in perfect agreement with the observed patterns, indicating correct contig assembly (results not shown). The size of the complete genome is 37 618 bp with 3′-protruding, single-stranded cohesive ends of 10 nucleotides (Fig. 1A–D). Its average molar G + C content of 34.8 mol% is slightly lower than the one reported for L. monocytogenes phage A118 (36.1 mol%), and lower than the 38.0 mol% reported for L. monocytogenes EGDe (Glaser et al., 2001). The sequence data reported here appear in the EMBL/GenBank/DDBJ databases under accession number AJ312240.
The presence of single stranded cohesive ends excludes terminal redundancy of the phage genome. This could be confirmed by partial denaturation mapping, where all inspected DNA molecules revealed the same pattern of AT-rich regions, clearly indicating that the genome is non-permuted, and all molecules are identical (Fig. 1E–G). Our observation that a significant portion (37%) of the DNA molecules form circles under appropriate conditions also indicated the presence of cohesive ends (results not shown). Moreover, enzymatic ligation of phage DNA before digestion with restriction endonucleases altered the restriction pattern exactly as expected, i.e. two terminal fragments appeared as one single fragment of the respective sizes (results not shown). The invariable genome structure of PSA agrees well with the finding that the phage is unable to perform general transduction of different genetic markers between L. monocytogenes strains (Hodgson, 2000). In fact, this type of transduction appears to correlate with terminal redundancy and circular permutation of the phage genome, at least in the cases studied so far (Loessner et al., 2000; J. Gruner, M. Zimmer, M. Loessner, unpubl. data).
Organization of the PSA genetic map
In silico analysis revealed the existence of 57 putative protein coding regions on the PSA genome, covering 91.5% of the total sequence. Criteria for potential open reading frames (ORFs) were the existence of an ATG, GTG (eight times present), or TTG (six times present) start codon, and a minimum coding capacity of 40 amino acids. All potential ORFs were preceded by a recognizable ribosome binding site with a sequence complementary to the 3′-end of L. monocytogenes 16S rRNA (Glaser et al., 2001).
The probable transcriptional units of PSA (as indicated by bioinformatics) are organized into three major functional clusters, nicely reflected by the transcriptional direction of the ORFs (see Fig. 2). The first, rightward transcribed cluster, from the left-arm cos-site to the attP site (co-ordinates 1–19 295) represents the late genes, encoding the structural proteins and the lysis system. The second cluster (approximate co-ordinates 19 296–22 445) is mostly transcribed in the opposite (leftward) direction and includes the att-site, the integrase (Lauer et al., 2002) and other lysogeny-related functions. The third cluster (nt 22 446–37 618) includes rightward-facing ORFs only, encoding the early lytic functions, including DNA replication and modification.
Similarities of the PSA gene products and functional assignments
The deduced amino acid (aa) sequences of the 57 ORFs were compared with known sequences from the databases to uncover similarities to genes with known function. Functional assignments and significant homologies are listed in Table 1; some of the most interesting findings are described below.
Table 1. . Features of bacteriophage PSA ORFs, gene products, functional assignments and amino acid sequence homologies.
Putative NTP-binding motif protein (φadh gp223; Sfi19: gp233; A2: gp240)
Helicase [Hel] (A2: gp455; φ31: Hel; 01205: Hel)
(DT1: gp34; 01205: gp11; φ31: gp4)
Primase [Pri] (A2: gp770; adh: gp771; φ105: gp11)
(DT1: gp38; Sfi19: gp106; Sfi11: gp106)
(φN315: gpSA1787; MM1: gp26)
Transcriptional activator (φ31:gp2; r1t:gp25)
(φN315: gpSA1779; φSLT: gp104b; φ3626: gp50)
ORF 11 and 12.
Gp12 is a large protein (112.1 kDa) with extended similarity to tape measure proteins (Tmp) from other phages, such as Lactococcus viruses bIL285 (Chopin et al., 2001) (57% over 576 aa), and TP901-1 (Pedersen et al., 2000) (36% over 770 aa). Similarities were also found to the uncharacterized gene products of S. aureus phages φN315 (43% over 896 aa) and φ13 (Iandolo et al., 2002) (36% over 976 aa). Tmp determines the length of the phage tail during virus morphogenesis (additional data on the structure of this protein are presented and discussed below). The product of orf11 itself did not display any similarity to known protein sequences, but in between orf11 and orf12, a short potential coding region (orf11A) is located which overlaps with orf11 in the − 1 frame, and ends immediately upstream of the start codon of orf12. This may represent yet another instance of the canonical example of frameshifting in the siphoviral tail gene cluster, exemplified by λ G and T (Levin et al., 1993). In PSA, a potential slippery sequence (5′-AAA′AAA′A-3′) resembling the classical heptanucleotide slippery sequence (Farabaugh, 1996) is located immediately upstream of the original orf11 stop. It is likely that a programmed − 1 frameshift can occur in the mRNA of orf11, resulting into two different gene products. Besides λ, a similar recoding event was recently shown to occur in the tail genes E and T of P2 (Christie et al., 2002), although the gene products of the two phages share no similarity in their amino acid sequences.
ORFs 19 and 20.
These two ORFs are located at the distal end of the late genes and encode a dual lysis system, consisting of a holin (Hol) and an endolysin (Ply). This genetic organization is common to many Siphoviridae (Lucchini et al., 1999). These proteins mediate the release of phage progeny from infected host cells. It was surprising to see that HolPSA (gp19) is not related to known holins from other Listeria monocytogenes phages (Loessner et al., 1995b), but displays similarity to holins from lactococcal phages such as Tuc (2009), LC3/TPW22, or TP901-1 (similarities approximately 65% over 85 aa) (Arendt et al., 1994; Birkeland, 1994; Petersen et al., 1999; Brondsted et al., 2001), and to holins from staphylococcal phages Twort, φPVL, and φ11 (Kaneko et al., 1998; Loessner et al., 1998; Navarre et al., 1999) with similarities from 54 to 60% over 76–84 aa. A transmembrane domain search based on the hidden Markov model (TmHMM) displayed high probabilities for two membrane-spanning domains within HolPSA (amino acid positions 13–35 and 45–64). This classifies HolPSA as a member of the class 2 holins (Young and Bläsi, 1995). The PSA endolysin (gp20) exhibited an interesting domain structure. Whereas its N-terminus (aa 1–180) is similar to N-acetylmuramoyl-l-alanine amidases from Bacillus cereus phage 12826 or the clostridial phage φ3626 (similarities 48% over 178 aa, respectively, 45% over 207 aa) (Loessner et al., 1997; Zimmer et al., 2002b), its C-terminal portion (aa 183–317) is almost identical to CBD500, the cell wall binding domain from the endolysin of phage A500 from L. monocytogenes (Loessner et al., 2002), and other endolysins from L. innocua prophages (Zink et al., 1995; Glaser et al., 2001). In this study, we have cloned and expressed the gene in E. coli, and confirmed its function in an overlay assay using L. monocytogenes cells as substrate. The recombinant enzyme rapidly hydrolysed the cells and formed large, clear halos around the E. coli colonies (data are not shown), similar to the endolysins from other Listeria phages (Loessner et al., 1995b; Loessner et al., 1996).
This gene encodes the phage integrase. Its function in site-specific integration of the viral genome into the 3′-end of a single copy tRNAArg gene was recently shown, and enabled the construction of the site-specific integration vector pPL2 (Lauer et al., 2002). Gp24 displays similarities to the integrases from Lactobacillus phage φg1e (Kodaira et al., 1997); 46% over 383 (aa), from L. lactis phage bIL309 (49% over 317 aa) and from Streptococcus thermophilus phage Sfi21 (Desiere et al., 1998) (48% over 332 aa), and from several other phages. An HMMScan also indicated that gp24 harbours an integrase domain. Amino acid sequence alignments showed that the PSA integrase has two active site arginines and the active site tyrosine found in the crystal structure of E. coli XerD, which is a tyrosine recombinase (Subramanya et al., 1997).
In general, we were surprised to find that PSA shares only few similarities with A118, on the level of amino acid sequence similarities of predicted gene products (Table 1). These homologies are not restricted to a specific cluster: ORFs 12 and 15 belong to the late genes, whereas ORFs 26, 27 and 35 are located in the lysogenic control region and the early genes respectively (Fig. 2).
Proteomics of the PSA virion
Structural proteins of the virus were separated by SDS polyacrylamide electrophoresis (Fig. 3). Their individual relative percentages were calculated based upon densitometrical scanning of the lane containing the PSA proteins. Microsequencing was used to determine the N-terminal amino acid sequences of the proteins from the five most prominent bands (see Fig. 3).
The sequence GFKSxVSGFF (N-terminal Met removed) identified the 46 kDa protein as gp3 (deduced size 45.3 kDa). This is the putative portal, which not only connects head and tail, but may have important functions as a barrier for DNA entry and exit.
A striking finding was that the two protein bands of approximately 36 kDa and 31 kDa revealed the identical N-terminal sequence EVIAGNGFAG, corresponding to the major capsid components Cps (gp5). From the coding sequence, it could be deduced that the primary polypeptides are post-translationally cleaved between N-81 and E-82, which results in loss of a 19.0 kDa N-terminal fragment during capsid protein processing and maturation. However, this did not explain the apparently different sizes of the two forms of the protein, which were designated as Cps and Cps-L respectively. A similar surprise was the observation that the two protein-bands observed at approximately 25 kDa and 22 kDa also revealed an identical N-terminal sequence (ATIVEDFDAT). This sequence corresponds to the major tail protein Tsh (gp10), of which the N-terminal Met had been removed. The two different forms were designated as Tsh and Tsh-L respectively.
Peptide fingerprints were used to identify two proteins present as relatively faint bands on the SDS gels, which were unsuitable for N-terminal sequence determination. These were the bands corresponding to ∼68 and ∼80 kDa sizes. MALDI-MS permitted identification of both proteins by the masses of their tryptic peptide fragments (data not shown).
The ∼68 kDa protein was identified as Gp14, to which no precise function could be allocated. The encoding gene is, however, located in a cluster which appears to encode components of the tail and base plate. The larger protein species (∼80 kDa) represents gp12, the probable tape measure (see above). However, its observed size differs by about 30 kDa from the deduced size of 112 kDa. MALDI-MS identified fragments corresponding to sequence at both termini (Fig. 4), but did not yield signals corresponding to fragments between amino acid positions 542 and 812. Figure 4 also shows the results of subjecting gp12 to specific structure prediction tools useful for the identification of tape measure proteins (see Discussion).
Different length products result from programmed translational frameshifting
Both the major capsid protein and the major tail protein are represented by two protein species each, having identical N-termini but different sizes and therefore probably different C-termini. One of several possible explanations was that these products could have been the results of programmed translational (ribosomal) frameshifts. Both genes feature 3′-overlaps with short open reading frames which were initially not thought to specify any product. However, bioinformatics indicated that, in both cases, either a − 2 or a + 1 shift could result in the synthesis of a larger polypeptide species, corresponding to the sizes observed on the gels. Such recoding events would result in two products of different sizes, sharing the same N-termini but vary in the length of their C-termini. To provide evidence for the actual existence of such products, and to determine the location and type of frameshift involved, mass spectroscopy was employed. MALDI-MS peptide fingerprints of Cps-L and Tsh-L were generated, and the determined masses of the individual tryptic polypeptide fragments were compared with the deduced masses for Cps-L (Table 2A) and Tsh-L (Table 3A). The analyses yielded total fragment coverage of 86% for Cps-L, and 97% for Tsh-L. Additional information was also generated by ESI-MS/MS analyses, which allowed to determine the exact amino acid sequence of some of the tryptic fragments (Tables 2B and 3B), and clearly revealed the existence of (C-terminally located) fragments corresponding to the predicted products resulting from either a − 2 or a + 1 frameshift. MALDI-MS enabled identification of the peptides spanning the potential frameshifting sites, and therefore also permitted determination of the location and modus of the shift.
Table 2A. . Peptide mass fingerprinting (MALDI-MS) of tryptic fragments of Cps-L.
Δ MW [%]c
Table 3A. . Peptide mass fingerprinting (MALDI-TOF) of Tsh-L.
Δ MW [%]c
Table 2B. . Amino acid sequence information obtained by ESI-MS of Cps-L.
. Expected molecular weight determined from the observed molecular mass of the protonated ions.
. Molecular weight of the corresponding fragment calculated from the deduced aa sequence.
. Difference between the expected and calculated molecular weights.
. Amino acid residues resulting from the + 1 frameshift are indicated in bold letters.
Figure 5 shows that the frameshift in cps occurs at a location close to the 3′ end of the gene, at the mRNA sequence ACA′CCC′UCC.G (corresponding to co-ordinates 5122–5131 of the PSA genome). The ribosome apparently slips from the CCC proline codon one nucleotide position into the 3′-direction (underlined), and continues from the overlapping proline triplet (CCU) in the + 1 frame until it reaches the stop codon at position 5290. Thus Cps-L contains most of the sequence of Cps (390 residues), with 53 extra amino acids from the alternate frame added onto the C-terminus. Interestingly, the cps mRNA is capable of forming a putative 3′ pseudoknot structure downstream of the slippage site (Fig. 5A). These secondary structures were previously known to stimulate frameshifting only in eukaryotic cells (see Discussion).
With respect to tsh mRNA, the slippery site is also located at the end of the gene. In fact, it actually overlaps the stop codon of the tsh reading frame (nt 7285–7290) (Fig. 5C), which has been designated a ‘shifty stop’ (Weiss et al., 1987). Similar to the situation in cps, the elongated Tsh-L version is produced via a + 1 frameshift, where the ribosome slips from the terminal CCC codon of the tsh coding sequence ACA′CCC′UGA one nucleotide position in the 3′ direction (underlined), and continues translation in the + 1 coding frame ending at the UAG at position 7363. Whereas the major Tsh consist of 192 amino acids, the alternate product Tsh-L polypeptide consists of 216 aa residues of which all 24 are unique.
The ratios of the major (normal length) products to the minor (longer) products incorporated into the phage particles were calculated from the relative amounts determined by densitometrical scanning (see Fig. 3). For Tsh, the proportion of the elongated product is about 25%, i.e. a ratio of 1:3. The proportion of Cps-L of total major capsid protein (both Cps species) is 13.2%, which corresponds to a ratio of 55:360. This value exactly reflects the ratio of the two different capsomere subunits required to build an icosahedral virus capsid of T = 7 symmetry (see Discussion).
This is the first detailed description and sequence analysis of a bacteriophage infecting the important L. monocytogenes serovar 4 strains. The PSA chromosome contains 57 coding regions organized in life cycle-specific gene clusters. As frequently observed for members of the Siphoviridae family, the genes responsible for the maintenance of the lysogenic life-cycle are oriented in an opposite direction compared to the genes necessary for a lytic development.
Many significant similarities found were to proteins of other phages infecting low G + C Gram-positive bacteria, from the genera Lactococcus, Staphylococcus, Streptococcus and Listeria, but the strongest homologies existed to the entries derived from the prophages found in the genomes of L. monocytogenes EGDe and L. innocua (Glaser et al., 2001). The existence of only few similarities between PSA and A118 was somewhat surprising, and suggests that the significant differences between Listeria monocytogenes serovar 4 and serovar 1/2 host strain lineages may also be manifested in their specific bacteriophages.
Gp12 was identified as the tape measure protein, it showed limited homologies to a known (TP901-1; Pedersen et al., 2000), and several putative Tmp from other phages (see Results). Additional support for this identification is the location of the gene, in a ‘standard order’ which is followed by most tailed pages fairly exactly: [major tail subunit]-[analogue of λ gpG and gpT]-[Tmp] (Pedulla et al., 2003). Moreover, gp12 fulfills the criteria that Tmp often predict as highly α-helical, and get high scores for coiled-coil potential (Fig. 4) (R. Hendrix, pers. comm.). Tape measure proteins are thought to determine the length of the phage tail by a ruler-mechanism, with the tail length being directly proportional to the size of Tmp (Katsura, 1987). Our findings are in line with this theory: the designated Tmp of PSA (gp12, 1026 aa) is about 20% larger than the size of the λ protein (gpH, 853 aa) and the PSA tail is approximately 20% longer than the λ tail (180 nm and 150 nm, respectively). Similar observations were made with respect to the Tmp proteins of A118 (Loessner et al., 2000), Clostridium perfringens phage φ3626 (Zimmer et al., 2002a) and of 13 phages infecting Mycobacterium (Pedulla et al., 2003).
We have shown here that the PSA Tmp appears as a structural component of the mature phage particle, where it makes up roughly 2% of the total virus protein. However, its apparent size observed on SDS gels (∼84 kDa) is about 25% smaller than expected (112.1 kDa). This finding correlates to a report on a lactococcal phage Tmp (Pedersen et al., 2000), which occurs as a 70 kDa protein suspected to represent the processed form of a 100 kDa primary product. Interestingly, this protein shows some sequence relatedness to the PSA tape measure. Our findings are also in line with the situation in Tmp of λ, where GpH is processed at the C-terminus to form GpH+, and also remains part of the virion (Katsura and Hendrix, 1984). With respect to PSA Gp12, we were surprised to find that MALDI-MS identified tryptic fragments corresponding to peptides at both termini (see Fig. 4), but yielded no signal corresponding to a 270 aa region between co-ordinates 542 and 812. The mass of these 270 residues is 29.7 kDa, and, as estimated by SDS-PAGE, the protein present in the virion is approximately 30 kDa smaller than deduced from its coding region. The experimental correlation between apparent size and fingerprint pattern suggests some form of ‘internal’ proteolytic processing, which may have removed this polypeptide region. It may also be noteworthy that bioinformatics (tmhmm) identified four potential transmem-brane domains in exactly this portion of the protein which appears to be missing from the mature form. However, the possible role of these predicted domains remains to be investigated. It may be that Gp12 is post-translationally processed in a way similar to proteins harbouring inteins, by excision of an intervening protein sequence (Paulus, 2000). However, Gp12 did not reveal any similarity to any of the inteins listed in the InBase database (http://www.neb.com/inteins/intein_intro.html; Perler, 2002). In conclusion, it is tempting to speculate that Gp12 may be processed in a novel way, by cleavage and removal of internal sequence during phage tail morphogenesis. Additional evidence would be required to confirm this exciting hypothesis, such as amino acid sequence information spanning the implicated portion of the processed protein.
Identification of the PSA endolysin enzyme was confirmed by a bioassay. The particularly interesting domain structure of this enzyme was revealed by sequence alignments. The C-terminal cell wall-binding domain (CBD), responsible for specific targeting to the Listeria cell wall, is highly homologous to the CBD500 domain (Loessner et al., 2002) and to the corresponding regions from L. innocua putative and confirmed prophage endolysins (Zink et al., 1995; Glaser et al., 2001). This was not surprising, because these phages multiply on serovar 4 and serovar 6 host cells, which feature highly specific ligands recognized by these polypeptides (Loessner et al., 2002). In contrast, however, the N-terminal portion showed no similarity to the known enzymatically active domains of Listeria phage endolysins (Loessner et al., 1995b; Zink et al., 1995). Instead, it appears to be closely related to amidases from phages infecting Bacillus and Clostridium. Hence, PlyPSA can be considered as a natural chimera: it combines a highly specific, apparently conserved cell wall recognition domain with a novel enzymatic domain of other hydrolytic activity (N-acetylmuramoyl-l-alanine amidase, instead of l-alanine-d-glutamate peptidase). The protein may have assembled by domain shuffling between the genes encoding the different enzymes, possibly as a result of horizontal gene transfer in a lysogenic cell. Whatever its origin may be, PlyPSA will certainly have some useful applications, because its amidase activity nicely complements the present set of enzymes used for the rapid and highly specific lysis of L. monocytogenes in various applications (Loessner et al., 1995a; Dietrich et al., 1998; Gaeng et al., 2000).
Both major structural proteins are represented by two protein species of different length, having identical N-termini but different C-termini. We have employed product analysis by mass spectrometry in order to demonstrate that the modified forms result from programmed + 1 ribosomal frameshifting on the mRNA transcripts. After a certain fraction of ribosomes perform the + 1 shift, they continue decoding in the overlapping frame, resulting in different length C-termini. The + 1 shifts were identified by the demonstration of specific tryptic peptide fragments. The masses were within less then a proton mass of the predicted (calculated) masses for a + 1 shift. Although possible products of − 2 shifts would feature the same C-terminal sequences, the fragments overlapping the slippery sequence would contain one additional amino acid (the first residue introduced into the chain after shifting back), and would therefore have a mass more than 100 Da different.
In both cps and tsh mRNA, the ribosome encounters an identical slippery sequence located at the 5′ ends of the reading frames, which likely triggers the + 1 shift. Within this ACA′CCC′U motif, the proline-encoding CCC codon appears to be the point where the ribosome stalls and slips one nucleotide position forward into the 3′ direction, facing the overlapping CCU prolin codon. Whereas CCC is a rare codon, representing only 5% of the possible proline codons in the low G + C Listeria genome (http://www.kazusa.or.jp/codon/) , CCU features a codon usage of 23%. It is therefore reasonable to assume that the ‘hungry’ CCC codon negatively influences ribosomal processivity, enhances stalling of the ribosome at the slippage site (Curran and Yarus, 1988), and thereby favours frameshifting, i.e. reassociation of the dislodged tRNAPro with the + 1 triplet CCU.
In addition to the above, however, two essentially different cis-acting factors seem to be involved in the two individual frameshifting events. The cps mRNA is capable of forming a secondary structure known as a pseudoknot, just seven nucleotides downstream of the slippage site and encompassing the UAA stop codon (see Fig. 5A). Structures such as pseudoknots or stem–loops adjacent to slippage sites have been shown to positively influence the frameshifting efficiency (Matsufuji et al., 1995; Larsen et al., 1997). It has been reasoned that this might be achieved by a ribosome stalling effect, allowing an extended time frame necessary for the anticodon:mRNA realignment in the frameshifting event (Alam et al., 1999).
It is interesting to note that the CCC′UGA ‘shifty stop’ at the 3′ end of tsh, featuring overlapping proline codons, was previously found to cause + 1 frameshifting in an E. coli plasmid system (de Smit et al., 1994). Another example of a high-frequency shifty stop is the peptide chain release factor 2 mRNA, at CUU′UGA (Craigen and Caskey, 1986). Modifying this site to CCC′UGA still permitted a + 1 shift, although with lower frequency (Curran, 1993). We have no information of the actual frequency of the frameshifting event during translation of the two PSA mRNAs, and therefore can not estimate the precise proportion of the products before they are being assembled into the virions. However, the relatively high abundance of the C-terminally modified proteins in the phage particles (see below) suggests that it should be a rather frequent event.
A question which we would like to answer is the possible role of the C-terminally modified proteins in the function, maturation or stabilization of the PSA virion. With respect to the capsid, the elongated Cps-L protein resulting from the + 1 shift contributes about 13.2% to the total amount of Cps. Considering the icosahedral symmetry of a phage capsid, this proportion agrees well with the following hypothesis, explaining not only the presence but also the ratio of the two Cps species. PSA almost certainly features a capsid structure of the triangulation number T = 7 (Caspar and Klug, 1962), similar to many Siphoviridae with an icosahedral capsid. These capsids consist of a total of 420 protein subunits, which are organized in 12 pentameric and 60 hexameric ring-structures, the capsomeres (Wikoff et al., 2000). One of the pentamers (five subunits) is removed for insertion of the portal, which leaves 11 pentamers (55 protein subunits) and 60 hexamers (360 subunits). This ratio of 55:360 (13.2%) capsomeres is in perfect agreement with the observed ratio between Cps-L and Cps. Therefore, it is theoretically possible that the + 1 frameshift in Cps produces the 55 subunits to build the 11 pentamers. The hypothesis that the long form of Cps might be located in the pentamers is reminiscent of the case of T4, where the hexamers are made of gp23 whereas the pentamers are made of a different protein, gp24 (Olson et al., 2001).
Frameshifting in synthesis of major capsid proteins was also reported for gene 10 of phage T7 (Condron et al., 1991). Similar to PSA, both possible products (10A and 10B) of this shift (−1 type) are present in the mature phage particle at a ratio of approximately 9:1 in favour of the shorter product. Complementation experiments revealed that the frameshift is not required for the formation of functional phage particles, and viruses devoid of 10B showed only small phenotypic changes, such as lower plating efficiency and smaller plaque size (Condron et al., 1991). With respect to PSA, no information is yet available on the actual requirement of the virion for the cps and tsh frameshift products. Although Tsh-L constitutes approximately 25% of the total Tsh species in the mature virion, its function is unclear. It can safely be assumed that it forms a part of the PSA tail shaft, most probably interacting with the major product Tsh. However, our present knowledge on the morphogenesis and structure of the complex multidisk siphoviral phage tails is insufficient and does not permit to assign a possible function to Tsh-L. It would certainly be interesting to determine whether the two PSA frameshift products are essential for multiplication or stability.
PSA appears to be the first example for the utilization of + 1 frameshifting by either bacteriophages or bacterial IS elements. Another novelty is that the + 1 shift in cps may be stimulated by a 3′ pseudoknot. The only other case where a pseudoknot is known so far to stimulate + 1 shifts is for antizyme 1 and 2 in higher organisms (Matsufuji et al., 1995; Ivanov et al., 1998). Pseudoknots were long thought to be absent from prokaryotes; the only known example is in − 1 frameshifting in an E. coli IS element (Sekine et al., 1994). The PSA case seems to be their first description among the world of viruses.
Organisms, plasmids, media and culture conditions
Listeria monocytogenes WSLC1042 (ATCC23074, serotype 4b) was used for propagation of PSA. Listeria strains were grown in tryptose broth (Merck) at 30°C. Escherichia coli DH5αMCR (Invitrogen) in combination with pBluescript II SK- (Stratagene) was used for as plasmid host for cloning of phage DNA and E. coli JM109 was used for cloning and expression of the endolysin gene plyPSA. Cultures were cultivated in Luria–Bertani (LB) medium at 37°C. For the selection of plasmid-bearing cells, ampicillin was added at 100 µg ml−1.
DNA purification, cloning and nucleotide sequencing
PSA was grown and purified as described earlier (Zink and Loessner, 1992; Loessner et al., 2000). The phage DNA was extracted and purified using standard techniques (Sambrook and Russel, 2001), and the genomic library of PSA was constructed after partial digestion with Tsp509I (New England Biolabs) as described previously (Loessner et al., 2000). Plasmids from small-scale cultures were digested with PauI (MBI Fermentas), and 58 clones carrying inserts varying in size were identified by agarose gel electrophoresis. The plasmid inserts were sequenced using IRD-800-labelled primers complementary to sequences flanking the multiple cloning site. Sequencing was performed using a heat-stable polymerase (SequiTherm EXCEL II; Epicentre Technologies) on an automated DNA sequencer (4200-IR2; LI-COR). The obtained sequences were edited, aligned and assembled using the software dnasis (Hitachi). Gaps between the contigs were closed by primer walking using PSA DNA as the template, with the aid of primers designed as sequences became available. This was performed on an ABI 373 A automated sequencer with dye-terminator technology (Applied Biosystems). Distinct chain termination signals were generated at the ends of the molecule, i.e. the single-stranded 3′-overhangs (cos-sites). The genome sequence was finalized by determination of the sequence of the cos-site overlaps, using a PCR product from DNA from the lysogenic host L. monocytogenes PSA, with primers complementary to sequences upstream and downstream of the cos-site.
The program dnasis and the Husar Analysis Package (version 4.0; http://www.genome.dkfz-heidelberg.de) were used for analysis of the nucleotide and amino acid sequences. The blast algorithms (Altschul et al., 1997) were used for similarity searches in the databases available through the NCBI (http://www.ncbi.nlm.nih.gov) , or the Husar Analysis Package. For the search of protein domains the bundled tasks software domainsweep, provided by the Husar Analysis Package, was used including blimps (BLocks IMProved Searcher), motifs, and hmmscan. For the prediction of potential transmembrane domains in the polypetide chains, the transmembrane hidden Markov model (tmhmm) was used (Sonnhammer et al., 1998). For the search of secondary protein structure the Husar software task 2dsweep was used including tmhmm, coilscan and psipred helix.
Physical structure of the PSA DNA molecule
DNA molecules released from phage particles were partially denatured and mapped as described earlier (Loessner et al., 2000), using a modified cytochrome c spreading technique (Inman and Schnös, 1979). Electron micrographs of the partially denatured molecules were measured and processed in comparison with a standard (m13mp18 DNA) following a procedure already described (Littlewood and Inman, 1982). The calculation of the running average AT content was determined with an average segment width of 400 nt (Funnell and Inman, 1979).
In order to check whether the DNA molecules carry cohesive (sticky) ends, they were diluted in 100 mM NaCl, 10 mM Tris to a concentration of 0.2 µg ml−1, heated to 70°C, cooled and left at 4°C for 60 h. After preparation for electron microscopy, a total of 103 molecules were screened for the formation of circles caused by annealing of the single-stranded ends.
Cloning and expression of the endolysin gene
The proposed PSA endolysin gene plyPSA was PCR-amplified from phage DNA, using primers PlyF (ATCAGGATCCAT GAGTAATTATAGTATGTCGCGAGGTCAC) and PlyR (A TCAGTCGACTTATTTTAAGAAGTAGTTAGCAGTGTAATA); restriction sites are underlined. The product was then digested with BamHI and SalI, purified, inserted into pQE-30 (Qiagen), and cloned into E. coli JM109. Activity testing of the clones carrying plasmids with the desired inserts was performed by IPTG induction on replica plates as previously described (Loessner et al., 1995b).
Amino-terminal protein sequencing
The isolation and purification of phage structural proteins was performed as described earlier (Loessner et al., 1994b). Proteins were separated by SDS-PAGE on ultrathin horizontal gels (Excel Gel SDS, Amersham Biosciences), followed by densitrometrical scanning (Imagemaster 1D, Amersham Biosciences) in order to determine the relative amounts protein in each band. For microsequencing, proteins were transferred onto a PVDF membrane, stained with Coomassie, and the five major bands (see Fig. 3) excised from the membrane. The first 10 amino acids (aa) each of the individual proteins were determined on an automated sequencer (Applied Biosystems Procise 492–01).
Peptide mass fingerprinting
For these experiments, the proteins were recovered directly from the gels. Before digestion, the proteins in the gel were modified using DTT as reducing and iodoacetamide as alkylating reagent. The modification was carried out in 50 mM ammonium bicarbonate during 15 min for each step at 37°C. The in-gel digestion with trypsin was carried out at an enzyme concentration of 12.5 ng µl−1 overnight, in 50 mM ammonium bicarbonate buffer (pH 8.0) at 37°C. Subsequent elution was performed in two steps with 0.1% trifluoroacetic acid/acetonitrile (2:3) during 20 h. The supernatants of the elution were collected, pooled and dried in a vacuum centrifuge. Afterwards, the peptides were dissolved in 5% formic acid.
MALDI-MS and ESI-MS mass spectrometry
Matrix-assisted laser-desorption ionization mass spectrometry (MALDI-MS) was carried out on a Bruker Reflex III TOF mass spectrometer equipped with a 26-sample SCOUT source and video system, a nitrogen UV laser (λmax= 337 nm) and a dual-channel plate detector (Bruker Daltonik, Bremen, Germany). One µl of the sample solution was placed on the target, and 1 µl of a freshly prepared saturated solution of α-cyano-4-hydroxy-cinnamic acid (CHCA) in acetonitrile/H2O (2:1) with 0.1% trifluoroacetic acid was added. For recording of the spectra an acceleration voltage of 20 kV was used and the detector voltage was adjusted to 1.7 kV. Between 50 and 500 single laser shots were summed into an accumulated spectrum. External calibration was carried out using a mixture of six synthetic peptides with molecular masses between 1046 and 2466 Da as well as the protonated dimer of the matrix (379 Da).
Electrospray ionization mass spectrometry (ESI-MS) was carried out with a Micromass QTOF II instrument (Micromass, Manchester, UK). The samples were introduced by nano-ESI. Before the analysis, the samples were desalted and concentrated using home-built microcolumns consisting of approximately 2.5 µl C18-reversed phase material (ODS-AQ, 120 Å, 50 µm; YMC Europe, Schermbeck, Germany) in a gel-loader tip (Biozym, Germany). The adsorbed peptides were washed with 5% formic acid and subsequently eluted with 3 µl 5% formic acid/methanol (1:1) directly into a gold-coated nanospray capillary (Protana A/S, Odense, DK). Spray voltage was adjusted between 2700 and 3000 V and block temperature was set to 30°C. For MS experiments, collision energy was set to 12 V, whereas for MS/MS experiments this was raised to a value between 25 and 40 V depending on size and charge state of the precursor ion to obtain optimal fragment ion spectra. Calibration was carried out between m/z 400 and 2500 using 40 mM H3PO4.
Special thanks to Roger Hendrix for his tremendous help in ‘phagobioinformatics’ and many useful suggestions and to John Atkins for helpful discussions about ribosomal frameshifting. We also wish to thank Ingo Krause for help in densitometrical scanning, Stefan Müller for help in performing mass spectrometry, Maria Schnös and Patrick Schiwek for their expert technical assistance, and Siegfried Scherer for his continuous support.