Crystal structure of the read-through domain from bacteriophage Qβ A1 protein



Bacteriophage Qβ is a small RNA virus that infects Escherichia coli. The virus particle contains a few copies of the minor coat protein A1, a C-terminally prolonged version of the coat protein, which is formed when ribosomes occasionally read-through the leaky stop codon of the coat protein. The crystal structure of the read-through domain from bacteriophage Qβ A1 protein was determined at a resolution of 1.8 Å. The domain consists of a heavily deformed five-stranded β-barrel on one side of the protein and a β-hairpin and a three-stranded β-sheet on the other. Several short helices and well-ordered loops are also present throughout the protein. The N-terminal part of the read-through domain contains a prominent polyproline type II helix. The overall fold of the domain is not similar to any published structure in the Protein Data Bank.


Bacteriophages of the Leviviridae family are among the smallest and simplest known viruses. They have a single-stranded, positive-sense RNA genome, which is about 3500–4200 nucleotides long and encodes a maturation protein, a coat protein, and a subunit of the replicase complex.1 The capsid is built from 90 dimers of coat protein that assemble in an icosahedral shell with T = 3 symmetry.2 In addition to the coat protein, each virion contains a single copy of the maturation protein.3 The maturation protein is bound to the genomic RNA4 and mediates the attachment of the phage to the sides of bacterial pili,5 which is the cellular receptor for all known Leviviridae phages. After attachment to the pili, the RNA-maturation protein complex leaves the capsid and enters the cell through an unknown mechanism.

Many of the known Leviviridae phages are further divided into two genera, leviviruses and alloleviviruses. A marked difference between the two genera is how the phages achieve cell lysis: leviviruses encode a small lysis protein that overlaps with coat and replicase genes in a different reading frame, whereas alloleviviruses mediate lysis using the maturation protein.6, 7 The other unique feature of alloleviviruses is the presence of a minor coat protein species A1 in their capsid. The A1 protein is produced when ribosomes occasionally read-through the leaky UGA termination codon of the coat protein gene8 and translation continues for another 600 nucleotides, resulting in a C-terminal extension of the coat protein. The A1 protein is incorporated in 3–10 copies per virion1 and is essential for producing infectious virus particles,9 but its precise function is not known. To gain new insights about this protein, we solved the crystal structure of the read-through extension from bacteriophage Qβ A1 protein.

Results and Discussion

Structure determination and quality of the models

Because of the low number and presumed random orientation in the capsid, the read-through extensions were not visible in the crystal structure of bacteriophage Qβ.10 The A1 protein alone is insoluble and cannot assemble into particles without the assistance of the coat protein,11 and the amount of A1 protein that can be incorporated into the particles seems to be limited to about 15%.1 To make the A1 protein amenable to structural analysis, we expressed the read-through domain separately. The complete read-through extension starting from the end of the coat protein was largely insoluble (data not shown), but a hexahistidine tagged variant starting 11 amino acids away from the coat protein part (residues 144–328 of full-length A1 protein) was highly soluble, could be readily purified, and was chosen to proceed with crystallization. The protein was crystallized in two crystal forms, monoclinic and hexagonal, which diffracted to 1.8 and 2.9 Å resolutions, respectively. The structure of the monoclinic form was solved by multiple isomorphous replacement with anomalous scattering using two derivatives. Except for the expression tag and the first two residues of the crystallized domain, the polypeptide chain could be traced unambiguously, without breaks, from residue 146 (the numbering of residues is as of full-length A1 protein) to the end of the chain. In the hexagonal form, another seven N-terminal residues could not be located in the electron density, and the chain was traced starting from residue 153. The domain adopts an almost identical conformation in the two crystal forms, with an rms deviation of 0.76 Å for the main chain atoms.

Overall structure

The overall fold of the read-through domain [Fig. 1(A)] is not similar to any other published structure in the Protein Data Bank, according to the DALI server.13 Except for the N-terminal region, the domain has a compact, roughly globular shape with a mixed α/β architecture. The core of the domain is built of β-sheets: strands β2, β3, β6, β7, and β8 form a heavily deformed, five-stranded β-barrel on one side of the protein, whereas β1 and β4 and β5, β9, and β10 form two antiparallel sheets on the other side. There are three α-helices and two 310-helices in the protein, which are all short and are located predominantly on the surface. A remarkably long loop (23 residues) connects the first 310-helix and strand β5, but it is well ordered and kept in place by extensive hydrogen bonding involving main chain and side chain atoms. Eight of the first 15 residues that are visible in the electron density map are prolines. These residues form a polyproline type II helix that stretches for about 45 Å before turning 90° toward the rest of the protein [Fig. 1(B)]. The polyproline helix is held in position by two crystal contacts with the globular part of neighboring molecules in the monoclinic crystal form but not in the hexagonal form. Consequently, the distant part of the helix is not visible in the hexagonal form, which suggests that it is flexible in solution. It should be noted that, although poplyproline type II helices are not uncommon in proteins, the vast majority of them are shorter than six residues14 and long helices are rare. The polyproline helix is A1 is quite remarkable in this aspect, since, according to a statistical survey of polyproline helices in protein structures in 2006,14 the longest such helix observed in a crystal structure was that of the benzoylformate decarboxylase from Pseudomonas putida15 (PDB ID 1BFD), which is 14 residues long and contains three prolines. The helix connects two subdomains of the enzyme but otherwise does not seem to have a specific function.

Figure 1.

Structure of the read-through domain. (A) Overall structure of the domain. The protein is represented as a cartoon model rainbow-colored blue (N-terminus) to red (C-terminus) and overlaid with a surface representation of the domain (light grey). (B) A detailed view of the polyproline helix. In the first 16 residues of the model, prolines are represented in cyan and other residues in deep blue. Figures 1(A,B) and 2(B) were prepared using PyMol.12

Currently, there is no structural information about residues 133–145, which separate the coat and read-through domains. Secondary structure prediction by JPred16 suggests that this region is unstructured except for the coat protein-proximal six residues, which, together with the last three residues of the coat protein, may be involved in a short α-helix.

Conserved regions

On the basis of phylogenetic and serological criteria, alloleviviruses cluster into two groups denoted III and IV.17, 18 Up to date, there are 15 allolevivirus genome sequences available, of these eight are from Group III and seven from Group IV. When all of the sequences are aligned, coat proteins are the most conserved (∼64% sequence identity), followed by the replicase (∼44% identity) and maturation proteins (∼29% identity). When sequences of all of the known A1 extensions are aligned, the total identity is only 26%, making them the most divergent part of all phage proteins. However, in a sequence alignment of A1 extensions from representative phages from Group III (Qβ and MX1) and Group IV (FI and SP) several conserved regions emerge [Fig. 2(A)]. First, in the N-terminal part (residues 146–159), ∼50% of the residues are prolines in all alloleviviruses, suggesting that the polyproline helix is present in all allolevivirus A1 proteins and is probably important for their function. A short stretch of amino acids immediately following the helix is also conserved. The most prominent conserved regions are located at residues 207–219 and 228–238, which form part of the long loop between helix 3101 and β5 and extend to strand β5 and the beginning of helix α2. The C-terminal region of the domain is also relatively conserved. Interestingly, the majority of conserved residues cluster on one side of the protein closer to the polyproline helix [Fig. 2(B)], suggesting that this part of the domain is the most critical for performing its function.

Figure 2.

Conserved regions of the read-through domains. (A) Sequence alignment of the read-through domains from different alloleviviruses. Conserved residues are colored red; of these, identical residues are shaded yellow and nonidentical light yellow. Assigned secondary structure elements are presented below the alignment. A dashed line represents the portion for which no experimental data are available; the α-helix from secondary structure prediction is drawn as a pale blue cylinder. (B) Mapping of the conserved regions on the three-dimensional structure of the read-through domain. Identical and nonidentical but conserved residues as of Figure 2(A) are colored red and yellow-orange, respectively.

Possible function of the A1 protein

The actual function of the read-through domain has remained enigmatic. The amino acid sequence and the three-dimensional (3D) structure of the A1 extension are not similar to other known proteins, leaving no clues about its evolutionary origin. The A1 protein is a landmark of the rather small group of alloleviviruses, which all infect Escherichia coli, whereas all other known Leviviridae phages suffice with just the coat and maturation proteins in the virion. However, the A1 protein is essential for producing viable phage particles, as shown by in vitro virus reassembly assays9 and in vivo plasmid complementation studies.19 The C-termini of coat proteins with some minor structural rearrangements could reach both the inner and outer surface of the capsid. However, current evidence suggests that as structural components of the virion, the read-through extensions are located on the exterior of the capsid. First, Qβ virions form a diffuse band in native polyacrylamide gel electrophoresis,20 and their mobility in the gel and in sucrose density gradients depends on how many copies of the A1 protein are present in the capsid.21 Additionally, when recombinant Qβ capsids contained A1 extensions with an engineered internal epitope tag from hepatitis B virus preS1 region, the tags were accessible to antibodies in an ELISA assay and immunogold electron microscopy confirmed that the antibodies were indeed bound to the capsid surface.11 The five-residue tag was inserted after residue 204, which is now known to be located in the short 3101-helix on the surface of the protein and likely did not disturb the structure of the domain.

An interesting feature of the A1 protein undoubtedly is the long polyproline type II helix at the N-terminal part of the read-through domain. Polyproline helices and proline-rich regions in general are relatively abundant in proteins and have different functions,22 but they frequently serve as ligands for various protein–protein interaction domains, resulting in formation of protein complexes that are often involved in signaling and regulatory pathways in eukaryotic cells (reviewed in Ref.23). In other proteins, proline-rich regions have a structural role and act as relatively rigid spacers to keep protein domains apart. For example, a 68 residue long proline-rich segment of the bacterial protein TonB was recently shown to adopt a polyproline II conformation that spans the periplasm.24

The linker between the coat and read-through domains would stretch for estimated 35 Å, and is then followed by the 45-Å-long polyproline helix, which is apparently also somewhat flexible. The logical explanation for such a long linker is that the read-through domain in the virion is positioned far away from the viral quasi-threefold symmetry axis (relating the three quasi-equivalent subunits A, B, and C) where the C-termini of coat proteins are located. A recent study localized the maturation protein from the distantly related phage MS2 on one of the viral fivefold symmetry axes and suggested that the Qβ maturation protein is localized similarly.25 Because both maturation and A1 proteins are required for infectivity, it seems possible that the two proteins might interact with each other and that the long linker would allow the read-through domain to reach viral fivefold and threefold symmetry axes that are ∼45 Å away from the C-termini of coat proteins. Experiments to test the association of the read-through domain with the maturation protein are underway in our laboratory.


We have shown that the read-through domain of Qβ A1 protein adopts a previously unseen protein fold and has some intriguing structural features, such as a 15 residue-long polyproline type II helix which is one of the best examples of this kind of helix in globular proteins for which the 3D structures have been determined. Although the structure of the read-through domain does not provide immediate answers about its function, it gives a good starting point for further studies that could eventually lead to the understanding of the molecular mechanism by which the small RNA phages infect the bacterial host.

Materials and Methods

Cloning, expression, and purification

The coding sequence of Qβ A1 extension was amplified from plasmid pQβ1026 using forward primer 5′-TACCATGGGGCACCATCATCACCATCATTCAAAC CCGATCCGGTTATTCC-3′ and reverse primer 5′-ATCTGCAGTTAAGCACGAGGAACGACTATCACG-3′. The resulting fragment, encoding an N-terminally 6xHis-tagged A1 extension (denoted His-A1 hereafter) was cloned into a modified pBAD/Thio vector (Invitrogen). For protein production, the plasmid was transformed in E.coli strain TOP10, and cells were grown in LB medium containing 50 μg/mL ampicillin at 37°C until OD590 of the culture reached 1.0. Arabinose was then added to a final concentration of 0.2%, and cells were grown for another 4 h and harvested by centrifugation.

Cells were resuspended in a lysis buffer containing 40 mM Tris-HCl pH 8.0, 200 mM NaCl, 20 mM MgSO4, 0.1% Triton-X100, 0.1 mg/mL DNAse, and 1 mg/mL lysozyme and lysed by three freeze-thaw cycles. The lysate was centrifuged, supernatant was loaded on a HIS-Select cartridge (Sigma), and bound His-A1 protein was eluted with buffer containing 40 mM Tris-HCl pH 8.0, 300 mM NaCl, and 300 mM imidazole. The sample buffer was then exchanged to 20 mM Tris-HCl pH 8.0 and 50 mM NaCl using Amicon spin filters (Millipore), and the preparation was applied to a HiPrep 16/10 Q FF ion exchange column (GE Healthcare), which was equilibrated with the same buffer. The His-A1 protein did not bind to the column under these conditions, whereas the majority of contaminants did. Finally, fractions containing His-A1 were pooled, concentrated, and loaded on a Superdex 200 10/300 GL gel filtration column (GE Healthcare), which was equilibrated with 20 mM Tris-HCl pH 8.0. Fractions containing His-A1 were pooled, concentrated to 10 mg/mL, and stored at −20°C until use.

Crystallization and data collection

The His-A1 protein was initially crystallized using the sitting drop vapor diffusion technique by mixing 1 μL of the protein solution (10 mg/mL) with 1μL of the well solution (0.1M Tris-HCl pH 8.5, 40% PEG 300). Plate-shaped crystals (the monoclinic form) appeared after 3–6 days at room temperature (298 K) and reached maximum dimensions of 0.3 × 0.1 mm. For data collection, crystals were flash-frozen in liquid nitrogen without additional cryoprotectant.

After optimization of crystallization conditions, slightly thicker crystals were obtained using 0.1M Tris-HCl pH 8.5, 20% PEG 300, and 10% PEG 2000 MME as the well solution. These crystals were less fragile, had less anisotropic diffraction and were used for heavy atom compound soaks. To prepare the mercury derivative, crystals were soaked in a mother liquor containing 20 mM Hg(NO3)2 for 30 min, followed by backsoaking in the original mother liquor for 10 s. For iodine derivatization, mother liquor containing 0.1M I2 in 0.1M KI was prepared, the undissolved iodine was removed by centrifugation, and the resulting iodine-saturated solution was used for soaking the crystals overnight. Crystals were flash-frozen without backsoaking.

When the structure of the A1 domain in the monoclinic crystal form was already solved, a hexagonal crystal form was discovered when any buffer was omitted from the crystallization drop (40% PEG 300 in water). Crystals appeared after 2–3 days at room temperature and grew bigger for about 1 week, reaching maximum size of 0.2 mm.

Crystal diffraction data were collected at European Synchrotron Radiation Facility (ESRF) and MAX-lab as indicated in Table I and Supporting Information Table 1.

Table I. Crystallographic data collection, scaling, and refinement statistics
  1. Values in parentheses are given for the highest resolution shell.

Data collection and scaling
 BeamlineESRF ID29MAX-Lab I911–2
 Cell parametersa = 44.01 Åa = 69.11 Å
b = 49.12 Åc = 167.30 Å
c = 44.26 Å 
β = 118.41° 
 Wavelength (Å)0.97621.0387
 Resolution (Å)49.15–1.7640.79–2.90
 Highest resolution bin (Å)1.86–1.763.06–2.90
 Rmerge0.078 (0.334)0.098 (0.543)
 Total number of observations5532154994
 Number of unique reflections164145773
 I/σ(I)10.0 (3.2)17.5 (3.8)
 Completeness (%)99.2 (99.5)100.0 (100.0)
 Multiplicity3.4 (3.3)9.5 (10.0)
 Average B factor (Å2)17.15341.116
 Number of atoms
 RMS deviations from ideal
  Bond lengths (Å)0.0230.012
  Bond angles (°)1.9251.368
 Ramachandran plot27
  Residues in favored regions (%)97.895.4
  Residues in allowed regions (%)100.099.4

Structure determination

Data were indexed with MOSFLM28 and scaled using SCALA.29 For the monoclinic crystal form, native and derivative datasets were scaled with SCALEIT30 and merged using CAD from the CCP4 suite.31 The position of the first mercury atom was calculated manually from the strongest peak in the Harker section of the isomorphous difference Patterson map. The coordinates of the mercury atom were input into MLPHARE32 and used to locate the remaining mercury and iodine atoms. Heavy atom refinement and phasing was performed in SHARP33 and was followed by solvent flattening in SOLOMON.34 From the resulting map, a partial model was built by BUCANEER35 that was included to provide extra phase information in a second SHARP and SOLOMON run. The resulting map was used to build an improved model with BUCANEER that served as a starting point for manual model building in COOT36 using the high-resolution native data. Refinement was performed using REFMAC.37 The structure of the hexagonal form was solved by molecular replacement with MOLREP38 using coordinates of the A1 domain in the monoclinic crystal form as a search model, followed by model building in COOT and refinement with REFMAC. Scaling and refinement statistics for native datasets are presented in Table I; detailed phasing statistics are given in Supporting Information Table 1.

Atomic coordinates were deposited in the Protein Data Bank under accession codes 3RLK (monoclinic crystal form) and 3RLC (hexagonal crystal form).


The authors thank the staff at ESRF and MAX-Lab for their help during data collection and Anna Janson for collecting the mercury derivative datasets.