Beta-helix model for the filamentous haemagglutinin adhesin of Bordetella pertussis and related bacterial secretory proteins


  • Andrey V. Kajava,

    1. Center for Molecular Modeling, Center for Information Technology, and
    Search for more papers by this author
  • Naiqian Cheng,

    1. Laboratory of Structural Biology, National Institute of Arthritis, Musculoskeletal and,
      Skin Diseases, Bldg 6, Room B2-34, MSC 2717, National Institutes of Health, Bethesda, MD 20892-2717, USA.
    Search for more papers by this author
  • Ryan Cleaver,

    1. Laboratory of Structural Biology, National Institute of Arthritis, Musculoskeletal and,
      Skin Diseases, Bldg 6, Room B2-34, MSC 2717, National Institutes of Health, Bethesda, MD 20892-2717, USA.
    Search for more papers by this author
  • Martin Kessel,

    1. Laboratory of Structural Biology, National Institute of Arthritis, Musculoskeletal and,
      Skin Diseases, Bldg 6, Room B2-34, MSC 2717, National Institutes of Health, Bethesda, MD 20892-2717, USA.
    Search for more papers by this author
    • Present address: Laboratory of Biochemistry, National Cancer Institute, Bethesda, MD 20892, USA.

  • Martha N. Simon,

    1. Department of Biology, Brookhaven National Laboratory, Upton, NY 11973, USA.
    Search for more papers by this author
  • Eve Willery,

    1. INSERM U447, Institut Pasteur de Lille, 1, rue du Prof. Calmette, 59019 Lille, France.
    Search for more papers by this author
  • Francoise Jacob-Dubuisson,

    1. INSERM U447, Institut Pasteur de Lille, 1, rue du Prof. Calmette, 59019 Lille, France.
    Search for more papers by this author
  • Camille Locht,

    1. INSERM U447, Institut Pasteur de Lille, 1, rue du Prof. Calmette, 59019 Lille, France.
    Search for more papers by this author
  • Alasdair C. Steven

    Corresponding author
    1. Laboratory of Structural Biology, National Institute of Arthritis, Musculoskeletal and,
      Skin Diseases, Bldg 6, Room B2-34, MSC 2717, National Institutes of Health, Bethesda, MD 20892-2717, USA.
    Search for more papers by this author


Bordetella pertussis establishes infection by attaching to epithelial cells of the respiratory tract. One of its adhesins is filamentous haemagglutinin (FHA), a 500-Å-long secreted protein that is rich in β-structure and contains two regions, R1 and R2, of tandem 19-residue repeats. Two models have been proposed in which the central shaft is (i) a hairpin made up of a pairing of two long antiparallel β-sheets; or (ii) a β-helix in which the polypeptide chain is coiled to form three long parallel β-sheets. We have analysed a truncated variant of FHA by electron microscopy (negative staining, shadowing and scanning transmission electron microscopy of unstained specimens): these observations support the latter model. Further support comes from detailed sequence analysis and molecular modelling studies. We applied a profile search method to the sequences adjacent to and between R1 and R2 and found additional ‘covert’ copies of the same motifs that may be recognized in overt form in the R1 and R2 sequence repeats. Their total number is sufficient to support the tenet of the β-helix model that the shaft domain – a 350 Å rod – should consist of a continuous run of these motifs, apart from loop inserts. The N-terminus, which does not contain such repeats, was found to be weakly homologous to cyclodextrin transferase, a protein of known immunoglobulin-like structure. Drawing on crystal structures of known β-helical proteins, we developed structural models of the coil motifs putatively formed by the R1 and R2 repeats. Finally, we applied the same profile search method to the sequence database and found several other proteins – all large secreted proteins of bacterial provenance – that have similar repeats and probably also similar structures.


Bordetella pertussis, the aetiological agent of whooping cough, establishes infection by attaching to ciliated epithelial cells of the respiratory tract (Weiss and Hewlett, 1986). One of its major adhesins is filamentous haemagglutinin (FHA), a surface-associated protein of ≈ 220 kDa, containing at least four distinct sites that bind specifically to different receptors on susceptible cells (Liu et al., 1997). The amino-terminal domain is involved in carbohydrate-dependent haemagglutination activity (Sandros and Tuomanen, 1993). The region that contains the RGD tripeptide motif interacts with the leucocyte integrin complement receptor 3 (CR3) (Locht et al., 1993; Ishibashi et al., 1994). A heparin binding site mediates the binding of FHA to the sulphated glycolipids on the epithelial cell surface (Menozzi et al., 1994). FHA also has a carbohydrate recognition domain (Liu et al., 1997).

Cloning of the FHA gene (Brown and Parker, 1987; Relman et al., 1989; Delisse-Gathoye et al., 1990) led to the identification of one of the longest open reading frames (ORFs) so far identified in prokaryotes, corresponding to a polypeptide chain of 367 kDa (Domenighini et al., 1990). This precursor protein, called FhaB, undergoes multiple proteolytic events in the course of secretion: its C-terminal third is removed (Domenighini et al., 1990; Arico et al., 1993), as are 71 residues from its N-terminus (Lambert-Buisine et al., 1998). FHA is one of the most efficiently secreted proteins of Gram-negative organisms, a property that may be exploited for secretion or surface display of passenger proteins (Renauld-Mongenie et al., 1996). In addition to its importance in bacterial adherence, FHA is a prominent immunogen (Kimura et al., 1990; Di Tommaso et al., 1991; Shahin et al., 1992). Acellular pertussis vaccines containing FHA have been in use since 1981 (Sato and Sato, 1984; Edwards, 1993).

These properties and the involvement of FHA in the pathogenesis of pertussis provide an incentive for seeking information about its molecular structure. The prospects for X-ray crystallography are hampered by both the poor solubility of FHA and its unwieldy dimensions, ≈ 500 Å long by ≈ 40 Å wide. However, it has been shown by electron microscopy that the FHA molecule has three domains – a globular head, a 350-Å-long shaft and a small flexible tail (Makhov et al., 1994). FHA is monomeric, and circular dichroism (CD) spectroscopy revealed a high content of β-structure. Sequence repeats in FHA were first detected by Relman et al. (1989) and Delisse-Gathoye et al. (1990). Makhov et al. (1994) extended these observations to cover two tracts of tandem 19-residue pseudorepeats, called R1 (residues 344–1065) and R2 (residues 1440–1688) respectively. Taken together, these data suggested two models: (i) a hairpin model (Fig. 1, right) in which two long, narrow, antiparallel β-sheets interact through their hydrophobic surfaces formed by non-polar side-chains; (ii) in the second model (Fig. 1, left), the shaft is a single β-helix, as in pectate lyase (Yoder et al., 1993), containing three long, narrow, parallel β-sheets arranged around the long axis of the molecule. In both cases, the β-sheets consist of many short transverse β-strands. In the hairpin model, both termini are at the same end of the molecule, whereas in the β-helical model, they are at opposite ends.

Figure 1.

A. Schematic representation of two models –β-helix (left) and hairpin (right) – for the 500-Å-long FHA molecule (Makhov et al., 1994). The positions occupied by the N- and C-termini and the R1 and R2 repeat regions are marked.

B. Cross-sections of the models: β-strands are shown as bars connected by loops.

Here, we report studies that support the latter model. We examined a truncated form of FHA, called FHA44, consisting of approximately its N-terminal half, by electron microscopy of specimens prepared by both negative staining and rotary shadowing. The protein was also observed without heavy metal contrast enhancement by scanning transmission electron microscopy after freeze-drying, allowing molecular mass measurements (Wall et al., 1998). The results concur with the structural prediction of the β-helix model but are not readily explained by the hairpin model. Next, we searched for additional copies of the repeat motifs, whose sequences are recognizable in the R1 and R2 regions but are not well conserved. (In known β-helical proteins, structural repeats – the coils – are usually not recognizable as sequence repeats.) By systematic analysis of the kinds of amino acid residues that occupy key positions and applying a generalized sequence profile search method, we were able to identify additional copies of the repeat motifs. Next, we formulated molecular models of both motifs, drawing on the growing body of information derived from crystal structures of β-helical proteins (Emsley et al., 1996;Heffron et al., 1998; Jenkins et al., 1998). Finally, we applied the same profile search method to the protein sequence database and identified similar sets of tandem repeats in several other bacterial secretory proteins.

Results and discussion

Electron microscopy of FHA44 after negative staining and rotary shadowing

FHA44 is an N-terminal fragment of FHA of ≈ 82 kDa (Jacob-Dubuisson et al., 1996). According to the β-helix model, FHA44 should form a rod of the same width as wild-type FHA, but shorter in proportion to the respective molecular weights (assuming a similar mass per unit length for the non-β-helical N-terminus; see below). The prediction of the hairpin model is less definite: the molecule might not fold into a rod, as the hydrophobic face of the predicted proximal β-sheet would not be sequestered from solvent by pairing with the distal β-sheet. Alternatively, such sequestration might be accomplished by pairing two monomers in register to give a dimeric molecule of about the same length as wild-type FHA.

SDS–PAGE of purified FHA44 yields a closely spaced doublet of bands migrating at ≈ 80 kDa (Fig. 2A; cf. Jacob-Dubuisson et al., 1996). A field of negatively stained FHA44 molecules is shown in Fig. 2B. The molecules are rod-like, about 40 Å in width and somewhat variable in length, with the majority ranging between 180 Å and 360 Å (average 276 Å; n = 146; Fig. 2C). This preparation was also contrasted by rotary shadowing after glycerol spraying, a technique that facilitates visualization of filamentous proteins (Tyler and Branton, 1980). Thus prepared (Fig. 2C), they again appear as rods of uniform width (≈ 50 Å) and variable length (190–380 Å, averaging 293 Å, n = 145). These dimensions are somewhat larger than those obtained by negative staining and are presumably exaggerated by the metal decoration effect.

Figure 2.

A. SDS–PAGE of the FHA44 preparation (right lane) stained with Coomassie blue.

B. Negatively stained electron micrograph of FHA44.

C. Length distribution of negatively stained molecules.

D. The same preparation of FHA44 prepared by low-angle rotary shadowing.

E. Length distribution of shadowed molecules.

Bars (B and D) = 1000 Å.

Scanning transmission electron microscopy (STEM) of FHA44

Darkfield STEM of unstained specimens provides a means of measuring the masses of individual molecules, in addition to their lengths (Thomas et al., 1994). The molecules were adsorbed to thin carbon films and washed extensively to eliminate non-volatile salts that might otherwise compromise the mass measurements (FHA44 was stored in 0.5 M NaCl to promote solubility). A typical micrograph is shown in Fig. 3A. A set of mass and length measurements is plotted in two dimensions in Fig. 3B. The majority of measurements (≈ 80%) represent a population of molecules that average 260 ± 22 Å (SD) and 98 ± 10 kDa. We infer that they represent FHA44 monomers. There is also a significant minority (10–15%) of molecules that are more massive (150–170 kDa) and somewhat variable in length, but generally longer. Finally, there are a few shorter molecules with lower masses that presumably represent contaminants or breakdown products.

Figure 3.

A. Darkfield STEM micrograph of unstained, freeze-dried FHA44. Bar = 500 Å. The large dense rods are tobacco mosaic virus particles used as a mass standard.

B. Two-dimensional distribution of mass and length measurements of individual FHA44 molecules. The measurements were made on freestanding molecules for which both ends were clearly visible. The sequence-derived masses of the FHA44 monomer (81.6 kDa) and dimer are marked (arrows). The horizontal line marks the approximate demarcation between these two populations.

Interpretation of electron microscopic data

Our micrographs consistently depict FHA44 molecules as rods of uniform width (≈ 40 Å, according to the negatively stained data) essentially the same width as full-length FHA and about half as long (Makhov et al., 1994). According to the STEM mass analysis, the majority species is a monomer (predicted to be 81.8 kDa from the DNA sequence). The length variability exceeds the expected experimental uncertainty, and we suspect that several factors contribute to the spread. First, the preparation contains a minor proteolytically derived component (despite our best efforts to minimize this effect), and the lower molecular weight species is presumably shorter. Secondly, it may be that the β-helix is not fully folded for all molecules: if several turns were unfolded, the disordered portion would probably be invisible in negative stain and STEM, resulting in an apparent shortening or, if it extended axially, the molecule might appear longer by rotary shadowing. A third source of variability emerges from the STEM analysis, which detected a minority species of what appear to be dimers, consisting of two monomers in side-by-side association, apparently with somewhat variable overlap (Fig. 3B).

We conclude that the EM data on FHA44 are in good agreement with the β-helix model. On the other hand, the hairpin model does not readily explain the STEM data, as it predicts either a non-folded, presumably globular, molecule or a 160 kDa rod about 250 Å long (see above). Although we observed a few molecules with the latter characteristics (Fig. 3B), this prediction does not account for the majority species of monomeric 260 Å rods.

Domainal and subdomainal divisions within the FHA molecule

FHA consists of three domains – the head, shaft and tail – which we differentiate into seven subdomains (Fig. 4) on the basis of sequence considerations (see below). The shaft is a rigid rod of approximately uniform cross-section for ≈ 350 Å (Makhov et al., 1994). As the pitch of a β-helix is 4.8 Å (Yoder et al., 1993), it should contain ≈ 73 coils, whereas R1 contains only 39 repeats (i.e. 39 coils of the helix) and R2 contains 13 repeats. The hairpin model also requires an extension of R2 to make it the same length as R1, so that the hydrophobic face of the R1 sheet would be fully covered. Thus, both models (hairpin and β-helix) anticipate the existence of additional copies of the structural motif(s) formed by the R1 and R2 repeats. Accordingly, we examined the amino acid sequence of FHA in search of additional copies, which we call ‘covert’ because they are not evident as sequence repeats.

Figure 4.

Left. The complete amino acid sequence of FHA with inferred domainal divisions marked, according to the β-helix model. There are five regions assigned to β-helical coils (B0, R1, B1, R2 and B2). In R1 and R2, each coil is equated with an overt 19-residue sequence repeat. B0, B1 and B2 are inferred to be made up of covert copies of the same motif, although its length appears to be less well conserved. The sequences of the R1 and R2 repeats and part of the B2 repeats are aligned within the respective domains, whereas the putative motifs in B0 and B1 are not so aligned; however, potential loops (so identified by a computer program using rules described in the text) are denoted by ‘–’. In the FHA sequence, glycine residues are shown in green, apolar residues are in red, and the remaining residues are in black.

Right. Ribbon representation of the whole FHA molecule generated by the molscript program (Kraulis, 1991). The R1 and R2 domains were drawn using the co-ordinates of the modelled structures. Detailed structures are not proposed for the remaining portions of the molecule: however, for the purpose of illustration, we approximated the other putative segments of β-helix by fragments of P.69 pertactin (blue) and Ig-like domains of cyclodextrin glycosyltransferase (N; purple). C is unknown.

In general, sequences with a propensity to form β-helices are harder to identify than, say, heptad-containing sequences that form α-helical coiled coils (Lupas, 1997). First, in a given β-helix, the repeats tend to vary in length because additional residues may be inserted at turn sites without disrupting the pattern of β-strands that stabilizes the structure (Yoder et al., 1993; Steinbacher et al., 1994). When it occurs, such variability will subvert attempts to detect repeats by Fourier analysis of the sequence. Secondly, the repeat motif varies considerably in length among known structures (Yoder et al., 1993; Steinbacher et al., 1994; Kobe, 1996; Jenkins et al., 1998): many are about 26 residues long, but motifs as short as 18–19 residues have been observed in pertactin (Emsley et al., 1996) and UDP-N-acetyl-glucosamine acyltransferase (Raetz and Roderick, 1995). This variability reflects a range of related structures, further complicating attempts to detect the motifs by pattern recognition methods. Nevertheless, FHA represents a relatively tractable case because its arrays of tandem 19-residue repeats are clearly recognizable in the R1 and R2 regions.

Identification of additional copies of the 19-residue motifs

Recognition by local sequence similarity

On re-examining R1, we found one more repeat at its N-terminal end (Fig. 4). Thus, R1 now covers residues 324–1065 and consists of two copies of a 20-residue repeat followed by 37 copies of a 19-residue cycle. Re-examination of R2 confirmed the previous number of repeats (13; Makhov et al., 1994). However, we noticed that the last residues of the repeats in the previous alignment were always non-polar. Modifying their indexation according to this position suggested another place for insertion/deletions within the repeats (Fig. 4) and a consequent revision of the boundaries of this region. R2 now runs from residue 1431 to 1680 and contains 13 copies of 19–20 residue repeats with a somewhat different consensus pattern from that of R1.

Recognition by sequence profile searching

The profile method of Gribskov et al. (1987) provides a sensitive tool for detecting distantly related members of protein families. A profile combines the information content of a family of sequences. For each position, it contains scoring information for every residue in the alignment, as well as on gap creation and gap extension. To identify additional R1- and R2-like repeats in the rest of the sequence, we applied the generalized sequence profile method (Bucher et al., 1996). Profiles were constructed for both R1 and R2 repeats. The R2 profile produced two matches, each corresponding to three tandem repeats, with a significance better than P < 0.01, namely residues 1219–1284 and 1716–1779. They are located, respectively, in the B1 region between R1 and R2 and in the B2 region, immediately after R2.

By systematically examining several proteins known to have β-helical structures – pectate lyase (Yoder et al., 1993), phage P22 tail spike protein (Steinbacher et al., 1994), UDP-N-acetylglucosamine acyltransferase (Raetz and Roderick, 1995) and pertactin (Emsley et al., 1996) – we were able to identify several trends, which we then used to identify additional copies of repeat motifs in B1 and B2. First, the basic repeat (helical coil), consisting of three β-strands connected by short loops, is about 19 residues long. Secondly, a typical β-strand has an xJxJx or xJxJxJx sequence pattern (here, J denotes an apolar residue, and x is mostly polar but can be any residue except Pro). Furthermore, the middle position in xJxJxJx usually has a bulky apolar residue, whereas the remaining J residues, being close to turns, are usually Ala, Gly, Ser or Thr. They may also be occupied by Asn or Gln residues that stack to form ladders inside the β-helix. Thirdly, interruptions of the xJxJxJx pattern should be loops with polar residues, Pro being especially abundant in long loops. These loops are quite variable in length and may even exceed the average repeat. Applying these rules, we predict 15 helical coils in B1 (Fig. 4).

This region contains an Arg–Gly–Asp (RGD) tripeptide (Relman et al., 1989), which is the cell attachment motif of many mammalian adhesion proteins (D'Souza et al., 1991). In our model of FHA, its RGD peptide is located in a loop, like that of P.69 pertactin (Emsley et al., 1996). B1 also has an Arg-rich site, Arg–Arg–Ala–Arg–Arg, starting at residue 1069, that is vulnerable to chymotrypsin (Domenighini et al., 1990). In the predicted coils of B1, this peptide is assigned to a long loop that should protrude from the body of the β-helix (Fig. 4). Cleavage in this peripheral site would not be expected to destabilize the helix significantly, which explains the observation that FHA remains essentially intact, i.e. the fragments did not separate, after cleavage at this site (Makhov et al., 1994).

Applying similar reasoning to the B2 region, eight possible repeats may be identified (Fig. 4), of which the six central ones can be aligned in order to comply with the R2 consensus pattern. As noted above, both the β-helix model and the hairpin model require additional (covert) copies of the repeat motifs. In the helical model, these copies should run continuously through B1, whereas the hairpin model (cf. Fig. 1) anticipates up to 26 copies in B2, with most or all of B1 in some other conformation(s). Thus, the results of the profile search favour the β-helix model.

Resemblance of an N-terminal segment of FHA to cyclodextrin transferase

The 252-residue N-terminal domain (residues 72–323) is similar in sequence to the N-terminal domains of several haemolysins (Makhov et al., 1994). We updated the set of such domains by searching the current database with the PSI-blast program (Altschul et al., 1997). Alignment of these sequences yielded a better template for seeking other similar sequences. As a result of a profile search (Bucher et al., 1996) against the latest release (January 2000) of the PDB database (Berman et al., 2000), we found a weak similarity between the N-terminal region of FHA (and hence also the similar haemagglutinins) and the cyclodextrin glycosyltransferase (CDGT) of Bacillus licheniformis(Fig. 5A). The latter protein is known to have an immunoglobulin-like fold (Lawson et al., 1994). The probability that this match is a product of chance alone is P < 0.1, calculated according to the method of Hofmann and Bucher (1995). Despite the relatively low level of sequence identity (10%) between residues 126–231 of FHA and CDGT, the chemical similarity is higher at 20% (Fig. 5A), and the plausibility of this relationship was verified by comparing the aligned N-terminal segments of the haemagglutinins and the CDGT structure. This comparison revealed conserved residues in core hydrophobic elements and several insertions and deletions in loops. Interestingly, the last two β-strands of the Ig-like domain of CDGT resemble part of a β-helical coil, so that this feature could provide a natural connection between the N-terminal domain of FHA and the β-helix of its shaft domain. According to this hypothesis, the CDGT-like domain of FHA should terminate around residue 231, with the β-helix extending directly out of it. In this case, four coils would precede the first one previously recognized as the first R1 repeat, which starts at residue 324. We call this region B0 (Fig. 4).

Figure 5.

A. Alignment of representative members of the N-terminal FHA homology domain family (FHAB_BORPE, filamentous hemagglutinin; PSP.NEIME, putative secreted protein from Neisseria meningitidis; HECA.ERWCU, HecA protein from Erwinia chrysanthemi; LSP1.HAEDU and LSP2.HAEDU, large supernatant proteins 1 and 2 from Haemophilus ducreyi; HEMO.EDWTA, HLYA_SERMA and HLYA_PROMI, α-chains of haemolysins from Edwardsiella tarda, Serratia marcescens and Proteus mirabilis; HHDA.HAEDU, HhdA protein from H. ducreyi). Residues identical or conservatively substituted in at least 50% of the sequences shown are printed on a black or grey background respectively. The numbering on the left refers to the position of the domain within the sequence. The bottom line indicates the secondary structure assignments of an Ig-like domain of cyclodextrin glycosyltransferase from Bacillus licheniformis. Structure types are abbreviated: e, extended conformation/β-strand; h, 310 or α-helix. Residues in the hydrophobic core of cyclodextrin transferase are marked with asterisks.

B. Consensus sequences of putative β-helical repeats. Upper case letters indicate more than 60% occurrence of a given residue in the specified position in the multiple alignment of the repeats. Lower case letters indicate between 60% and 40% identity of a given residue, with the remaining residues in this position having similar properties. x, any residue. Sites of insertions are denoted by [⇆]. Accession numbers: 1Q03155; 2P39180and 3P16466; 4AF057695and AF057696.

C. Domain schemes for some proteins that contain regions with FHA-like repeats, shown as hatched. The repeat region of LSPA1 may be larger despite the absence of overt sequence repeats. Three of the proteins also contain N-terminal FHA homology domains (black boxes). Protein abbreviations are as introduced in the text. The numbering refers to the positions of residues that demarcate these domains.

Extent of shaft domain and number of β-helical coils: a summary

The length of the FHA shaft is not known exactly. An estimate of 350 Å was obtained by electron microscopy but, on account of limited resolution, the demarcation between the shaft and the terminal domains was not sharply defined, and a reasonable margin of error would allow a range of 320–380 Å. Thus, the β-helical model requires that the shaft should consist of 67–79 coils. It is critical to this model that B1 should consist of an unbroken run of coils. The conclusion of our sequence profile search meets this requirement: B1 should contain 15 covert copies of the repeat motif, albeit more variable in length than those of R1 and R2. Thus, the central part of the shaft should consist of 39 (R1) + 15 (B1) + 13 (R2) = 67 coils. As discussed above, the regions proximal to R1 and distal to R2 may contain four (B0) and eight (B2) more coils respectively. Thus, the total may be as high as 79. Although the total number of coils/repeats is not defined precisely by either method, we conclude that the inferences proceeding from electron microscopy and sequence analysis, respectively, are compatible.

Modelling the three-dimensional structures of the R1 and R2 motifs

For globular proteins, structural prediction remains risky in the absence of strong sequence similarity to protein(s) of known structure (Moult et al., 1997). However, for proteins with repetitive substructures, a priori predictions can be quite reliable. Examples include silk fibroin and collagen (Fraser and MacRae, 1973); α-helical coiled coils (Cohen and Parry, 1990; Lupas, 1997), superhelical proteins with repeats of 20–30 residues, such as LRR proteins (Kajava et al., 1995; Kajava, 2001) and even bead-like proteins with 30–40 residue periodicities, such as Zn-finger proteins (Miller et al., 1985). In all cases, predictions have been facilitated by assuming repetitive spatial arrangements within the tandem repeats, by identifying conserved (and presumably structurally important) sites in the repeats and by incorporating constraints from indirect lines of evidence. As FHA fulfils these requirements, we considered it would be worthwhile to elaborate the β-helix model further by structural prediction.

R1 and R2 are the best regions to test the relationship between the observed sequence repeats and the putative β-helical coils. Alignments of both sets of repeats (Fig. 6) distinguish two kinds of positions within them: positions with conserved and mostly non-polar residues; and positions occupied by variable, mostly polar residues. This pattern may be readily explained in terms of β-helical structure. The H-bonded β-strands form a boundary between the interior of the helix and its solvent-accessible exterior, so that inside residues should be non-polar, whereas exterior residues should be mostly polar. This correlation allows us to predict the interior/exterior residues within the repeats (Fig. 6B). Furthermore, the distribution of conserved residues within the repeats is a potential source of information about the disposition of strands and turns. Indeed, in known β-helices, the stretches of the alternating conserved/variable positions correspond to strands (Yoder et al., 1993; Jenkins et al., 1998): conversely, interruptions of this alternating pattern should be turns. On this basis, we identified likely candidates for the three strands and turns in both repeats (Fig. 6B). R1 has a pair of three-residue turns (first and third) and one two-residue turn (second). R2 has a two-residue turn followed by two three-residue turns. It is important to note that several insertion/deletion sites identified in the R1 and R2 alignments are assigned to turn regions, consistent with the suggested arrangement. Furthermore, the putative loop positions are frequently occupied by Gly, the residue most typical of turns (Efimov, 1993).

Figure 6.

A. Sequence alignment of several R1 and R2 repeats with consensus patterns and position numbering. The conserved residue positions are in red. Positions frequently occupied by glycine are in green.

B. Schematic arrangement of the R1 and R2 coils. The curved arrow line shows the course of the polypeptide chain. The colours of the spheres correspond to the sequence alignments given in (A).

C. Space-filling model, showing typical cross-sections through R1 and R2 perpendicular to the helix axis, with the internal residues coloured red.

D. Ribbon diagrams prepared with molscript (Kraulis, 1991), showing four-coil fragments of R1 and R2.

The chirality of the putative β-helix of FHA is not evident from its sequence. Both right- and left-handed β-helices have been observed, and their sequence properties and conformations are similar (Jenkins et al., 1998). We chose the right-handed topology for FHA, based on the following considerations: (i) the majority of known β-helices are right-handed (Yoder et al., 1993; Steinbacher et al., 1994; Jenkins et al., 1998; Kobe and Kajava, 2000); (ii) P.69 pertactin, a protein relevant to FHA in terms of origin (B. pertussis), secretion mechanism and bacterial surface location, is known to have a right-handed helix (Emsley et al., 1996).

The triangular arrangement shown in Fig. 6B was used to model the R1 and R2 motifs. A search among known β-helices for a structural template did not reveal an exact replica for either R1 or R2. However, we found several fragments, which, in combination, provided an initial template. Among two-residue turns connecting β-strands, the most typical conformation is ββαLβ (e.g. Yoder et al., 1993; Emsley et al., 1996), with the symbols α, αL, β and βp denoting, in order, residue backbone conformations close to the right- and left-handed α-helical conformations, the β- and polyproline conformations (underlined positions in loops denote conformations assigned to β-strand fragments). The other plausible conformation for such a two-residue turn is βαββ (e.g. in P.69 pertactin from position 396 to 399, 414 to 417 or 441 to 444). The path of the β-strands is redirected by ≈ 90° by both turns. Based on modelling (see Experimental procedures), we selected the ββαLβ conformation for the two-residue turn of R1, whereas the βαββ conformation was more appropriate for R2.

Another type of turn with three residues between the strands makes an almost complete (180°) turn. Its possible conformations are ββpβpαLβ (e.g. phage P22 tail spike protein from position 184 to 189 or P.69 pertactin from position 400 to 404), ββαLαLβp (e.g. P.69 pertactin from 353 to 357) or βααββ (phage P22 tail spike protein from position 436 to 440 or P.69 pertactin from position 466 to 470). Stereochemical analysis suggests another possible turn –βαβαLβ. Although this last variant has not been found in known β-helices, we cannot rule it out.

Comparative analysis revealed that the last internal residues of the ββpβpαLβ and ββαLαLβp loops are directed towards the polypeptide backbone and thus cannot have bulky side-chains (Fig. 7, left): in known β-helices, this position is occupied by residues with small side-chains – Gly, Ala or Ser. In contrast, bulky Val, Ile or Leu residues tend to occur in the first internal position of the loops. For the βααββ and βαβαLβ loops, the situation is reversed; the first internal residue is close to the backbone atoms of the turn and is therefore small, whereas the last side-chain has more space available and is bulky and non-polar (Fig. 7, right). The fact that the corresponding loops of the R1 motif have small residues in the first internal position and bulky ones in the last position suggests that their conformation is βααββ or βαβαLβ. The βααββ conformation was chosen for the three-residue turns of the R1 template because of its frequent occurrence in known β-helices. The combination of three-residue turns that best fits the R2 arrangement was different: βααββ for the second and βαβαLβ for the third turn. The selected turn and strand fragments were combined to form the R1 and R2 templates (see Experimental procedures), and these models were refined by energy minimization.

Figure 7.

two representative lops with ββpβpαLβ (right) and βααββ (left) conformations. The first loop was taken from the P.69 pertactin structure (residues 131–135; Emsley et al., 1996) and the second from the phage P22 tail spike protein (436–440; Steinbacher et al., 1994). For simplicity, only Cβ atoms of the outside
side-chains are shown. The side-chains of the first and last residues of the β-strands connected by loops are in grey.

Structural models for the R1 and R2 β-helices

Energy minimization of both templates resulted in compact structures without steric tension and with all donors and acceptors of H-bonds interacting with each other or having the possibility of binding water molecules. In accordance with PROCHECK output (Laskowski et al., 1993), all residues of both models have backbone conformations from allowed regions of the Ramachandran plot; and G-factors of the polypeptide stereochemistry (a log-odds score based on the observed distribution of the covalent geometry) equal to −0.15 and −0.52 respectively. The overall average G-factors for R1 and R2 models are −0.50 and −0.49, values that would be expected for good-quality models. One coil of the modelled β-helices corresponds to one sequence repeat.

The R1 coil conformation (starting from position one in Fig. 6) is βββββααβββββαLβpββααβ, where underlined elements denote β-strands. This model accounts for residue conservation among the repeats. According to the scheme in Fig. 7, conserved non-polar residues point into the non-polar interior. The smallest of them (Gly, Ala and Ser) are located near the two U-turns (positions 5 and 16), where there is insufficient room for bulky side-chains. The H-bonding of two adjacent coils is not interrupted by the two-residue turn (positions 12–13). The peptide groups of the three-residue turns (positions 6–8 and 17–19) do not make ideal H-bonds with the turns of neighbouring strands and may need assistance from the donors/acceptors of the side-chains. This consideration may explain the presence of Ser in positions 5 and 16. Moreover, the frequent occurrence of Gly in the three-residue turns complies with the requirement to adjust the intercoil packing better. The conformation of the R2 motif (starting from position one in Fig. 6) is βpββββαββββααββββαβαL. Its most distinctive feature is the conserved Asn in position 10, which forms an H-bonded network in the non-polar interior, similar to the Asn ladder of pectate lyase (Yoder et al., 1993). The Asns from adjacent coils form H-bonds with each other: in addition, the NH group of each Asn bonds with the backbone CO (position 13) from the previous coil. Position 8 is frequently occupied by bulky Phe residues, which are directed towards the centre of the triangular prism and fill up the non-polar interior. As in R1, Gly occurs frequently in the loop positions, allowing a better adjustment of interloop packing. In both models (R1 and R2; cf. Fig. 6D), the polypeptide chain follows a right-handed helix. The β-strands assemble into almost flat parallel β-sheets, which form the sides of triangular prism. As with other β-helical proteins, the strands have a shallow right-handed twist (Fig. 6).

Proteins with similar repeats

To identify similar proteins, we used the generalized sequence profile method (Bucher et al., 1996) to search the sequence database with the R1 and R2 repeat profiles and found matches with several bacterial secretory proteins. Although their significance was relatively low, the presence of repeats in these proteins was confirmed by means of a dot-matrix analysis. New profiles were then constructed for each protein, spanning three of its repeats, and used to estimate the total number of repeats.

  • AIDA-I, an Escherichia coli protein involved in diffuse adherence to Hela cells (Benz and Schmidt, 1992). This molecule has at least 35 well-recognized (P < 10−4) copies of 18–19 residue repeats (Fig. 5B) in the N-terminal part of an 846-residue mature chain.

  • The α-chain of antigen 43 from E. coli (Owen et al., 1996). Its 499-residue chain contains about 18 repeats of 16–19 residues whose consensus sequence is similar to that of AIDA-I (P < 10−3).

  • The α-chain of haemolysins from Edwardsiella tarda, Serratia marcescens and Proteus mirabilis. The central part of their sequences (about 1600 residues) contain long tandem arrays of ≈ 21-residue repeats (Fig. 5B and C). Although less faithfully repeated (P < 10−2) than R1 or R2, this motif is clearly recognizable and has β-helical features.

  • Large supernatant proteins 1 (LSPA1) and 2 (LSPA2) from Haemophilus ducreyi (Ward et al., 1998). We identified about 10 21-residue repeats (P < 10−4) in the N-terminal halves of these products of very long ORFs (4152 and 4919 residues respectively). The consensus repeat sequence differs slightly from that of FHA (Fig. 5B and C), but preserves features important for the β-helix.

  • Several R2-like repeats were found in the 2273-residue-long putative secreted protein (PSP) from Neisseria meningitidis (GenBank accession number AF030941), although they are not organized in a well-defined tandem array.

By analogy with FHA, these proteins may all have β-helical domains. In particular, they are expected to have rod domains of ≈ 165 Å (AIDA-I), 85 Å (Ag43), 100 Å (α-haemolysin) and 50 Å (LSPA1–2) in length. The predicted rods may, in fact, be longer, because these proteins may also contain covert copies of their repeat motifs, as we believe to be the case with FHA. Moreover, their motifs should be amenable to modelling by applying the same precepts as we have applied to FHA (above). Although these proteins certainly vary, all are very large secreted proteins of Gram-negative bacteria and have been implicated as virulence factors. Moreover, three of the five N-terminal domains are similar to the corresponding domain of FHA (Fig. 5C), providing additional support for the inferred structural commonality.

Implications for adhesin functions of FHA and vaccine development

The β-helix model has implications as to how FHA may fulfil its role as a multivalent adhesin. The proposed β-helix, which is much longer than in any such structure currently on record, should provide a rigid platform on which the various adhesive sites are presented. Current information on their distribution is summarized in Fig. 4. Given the dimensions of FHA, a mode of binding in which its long axis is oriented approximately parallel to the cell surface would facilitate co-ordinated binding of a single molecule to more than one site. Such interactions would require the sites in question to be on the same side of the helix. Conversely, the sites that maintain contact with B. pertussis (FHA is a secreted protein) should be on the other side of the helix. Interestingly, as the model places the RGD integrin binding site near the middle of the molecule, ≈ 250 Å from either end, this configuration would be sterically compatible with the integrin model proposed by Chiu et al. (1999) for adenovirus receptor binding, in which the integrin dimer is envisaged to extend from the host cell membrane, with the RGD binding pocket on its top (outer) surface.

One property that is common to many β-helical proteins is carbohydrate binding. In crystallographic studies of P22 tail spike protein (Steinbacher et al., 1996) and pectate lyase (Scavetta et al., 1999), the carbohydates were seen to lie along the wall of the helix, with their binding stabilized by interactions with charged amino acid residues in neighbouring loops. These examples suggest a possible prototype for the carbohydrate recognition activity of FHA (Liu et al., 1997). It is also possible that such loops may present immunoprotective epitopes. Study of the interaction of FHA with its receptors at the experimental level is a desirable goal for future work. In this context, the suggested fold allows the design of experiments not only to test the model but also to explore the interactions of FHA and to probe the portions of the molecule that are involved in the generation of protective immunity.

Experimental procedures

Electron microscopy

FHA44 was prepared and purified as described previously (Jacob-Dubuisson et al., 1996) and stored at 0.57 mg ml−1 protein in 500 mM NaCl, 10 mM Tris, pH 7.0, at −80°C until use. For negative staining, the sample was diluted to ≈ 20 µg ml−1 and either adsorbed to a thin carbon film supported on an EM grid or to a thin carbon film on a mica substrate that was then picked up on an EM grid bearing a thick holey carbon film. In either case, the specimen was stained with 1% uranyl acetate. Rotary shadowing was performed essentially according to the method of Tyler and Branton (1980). Briefly, drops containing 25 µg ml−1 protein in 50% glycerol and PBS buffer were nebulized on to freshly cleaved mica, dried under vacuum at room temperature and rotary shadowed with platinum at an elevation angle of 8° in a Baltec 160 freeze-fracture machine (Technotrade). Finally, a stabilizing layer of evaporated carbon was applied. Specimens were observed in either a Zeiss EM902 (Leica) or a Philips CM120 (FEI) transmission electron microscope.

STEM microscopy was performed at the NIH Biotechnology Resource instrument at Brookhaven National Laboratory (Wall et al., 1998). Briefly, the sample was diluted to 20 µg ml−1, adsorbed from a 5 µl drop to an EM grid bearing a thin carbon film substrate, washed 10 times with 50 mM ammonium acetate, blotted to a thin film and freeze-dried under high vacuum at constant sublimation rate. Tobacco mosaic virus particles preadsorbed to the EM grids were used as an internal mass standard. Digital micrographs of 512 × 512 pixels were recorded, using magnifications such that 1 pixel step corresponded to 10 Å. Mass and length measurements were performed interactively using the pic-III program (Trus et al., 1996) to analyse micrographs displayed on the monitor of an Alpha XP1000 workstation (Compaq).

Sequence profile search

The generalized sequence profile method and the pftools package (Bucher et al., 1996) were used. As a single repeat would be unlikely to form a stable structure on its own, we limited the search to proteins containing at least three tandem repeats, thus increasing the selectivity of the search. Two such profiles, each spanning three repeats, were constructed – one each for R1 and R2. The initial profiles were edited manually to increase the variability of positions, which are located externally according to the R1 and R2 models. The profiles for the other proteins analysed were constructed based on the alignment of several fragments each containing three tandem repeats, initially selected and aligned using a dot-matrix program dotter (Sonnhammer and Durbin, 1995). The profiles were matched against the latest release of the GENPEPT protein sequence database (Benton, 1990). The probability that the matches are a product of chance alone was calculated by analysing the score distribution obtained from a profile search against a regionally randomized version of the protein database, assuming an extreme value distribution (Hofmann and Bucher, 1995). The repetitive structures of the detected sequences were then examined individually.

Molecular modelling

The initial templates for R1 and R2 were constructed from several pieces (see Results) using the insight II program (Dayring et al., 1986). First, a β-coil of similar length and with the same triangular cross-section as residues 397–415 of P.69 pertactin was chosen to serve as a scaffold in constructing the R1 template. Then, three β-strand segments corresponding to the β1, β2 and β3 regions of R1 were superimposed on this coil. After this, several selected turns were tried to find the best accommodation with the provisional β-strand arrangement. During this analysis, the main chain atoms of the terminal residues of the turns were grafted into the corresponding positions of the β-strands by means of a root-mean-square (RMS) algorithm. Then, the superposition was improved by manual translation of the β-strands with reference to the target pertactin coil and by manual variation of the torsion angles of the turns. Next, the turns and strands were linked covalently. The resulting template of one complete coil was replicated and superimposed on a four-coil fragment from P.69 pertactin (positions 378–451). Finally, these four fragments were fused to form a four-coil segment of the R1 or R2 helix. The amino acid sequence of the template was edited in accordance with the FHA sequence (positions 364–422 of R1; positions 1587–1669 of R2) using the homology modelling option of insight II (Dayring et al., 1986). The structure was refined further by the energy minimization procedure, based on the steepest descent algorithm implemented in the Discovery subroutine of insight II, and tethering heavy backbone atoms to their starting conformations with force constant K = 100. The 300 steps of minimization led to a maximum RMS derivative of 0.36 kcal/ (mol*Å). The next stage of minimization was 500 steps of the conjugate gradients algorithm, tethering the backbone atoms with lower force (K = 50) and then 300 steps with K = 25. The tethering was accompanied by setting the distance constraints at K = 50, in order to improve the geometry of structural H bonds. To allay the concern that these constraints generated significant tensions in the minimized structure, the last calculation was performed without any restrictions to an RMS derivative of 0.3 kcal/(mol*Å). The CVFF force field (Dauber-Osguthorpe et al., 1988) and the distance-dependent dielectric constant were used for the energy calculations.


We thank J. Wall for support at the Brookhaven National Laboratory STEM facility, which is an NIH-supported Resource Center NIH 41-RR01777, with additional support from the Department of Energy and the Office of Biological and Environmental Research. We also thank Dr B. Trus for suggesting the modelling study, Dr M. Cerritelli for gel electrophoresis, Drs P. Wingfield and K. Miroshnikov for helpful discussions, and Dr D. Burns for critical reading of the manuscript. F.J.-D. is a researcher from the CNRS.