Crystal structure of the global regulator FlhD from Escherichia coli at 1.8 Å resolution


  • Andrés Campos,

    Corresponding author
    1. Department of Microbiology and Immunology (M/C 790), College of Medicine, University of Illinois at Chicago, 835 S. Wolcott Ave., MSB E-603, Chicago, IL 60612-7344, USA.
    • *For correspondence. E-mail; Tel. (+1) 312 413 0288; Fax (+1) 312 996 6415.

    Search for more papers by this author
  • RongGuang Zhang,

    1. Structural Biology Center, Argonne National Laboratory, Argonne, IL 60439, USA.
    Search for more papers by this author
  • Randal W. Alkire,

    1. Structural Biology Center, Argonne National Laboratory, Argonne, IL 60439, USA.
    Search for more papers by this author
  • Philip Matsumura,

    1. Department of Microbiology and Immunology (M/C 790), College of Medicine, University of Illinois at Chicago, 835 S. Wolcott Ave., MSB E-603, Chicago, IL 60612-7344, USA.
    2. Molecular Biology Consortium, 2201 W. Campbell Drive, Chicago, IL 60439, USA.
    Search for more papers by this author
  • Edwin M. Westbrook

    1. Molecular Biology Consortium, 2201 W. Campbell Drive, Chicago, IL 60439, USA.
    Search for more papers by this author


FlhD is a 13.3 kDa transcriptional activator protein of flagellar genes and a global regulator. FlhD activates the transcription of class II operons in the flagellar regulon when complexed with a second protein FlhC (21.5 kDa). FlhD also regulates other expression systems in Escherichia coli. We are seeking to understand this plasticity of FlhD's DNA-binding specificity and, to this end, we have determined the crystal structure of the isolated FlhD protein. The structure was solved by substituting seleno-methionine for natural sulphur-methionine in FlhD, crystallizing the protein and determining the structure factor phases by the method of multiple-energy anomalous dispersion (MAD). The FlhD protein is dimeric. The dimer is tightly coupled, with an intimate contact surface, implying that the dimer does not easily dissociate. The FlhD monomer is predominantly α-helical. The C-termini of both FlhD monomers (residues 83–116) are completely disrupted by crystal packing, implying that this region of FlhD is highly flexible. However, part of the C-terminus structure in chain A (residues 83–98) was modelled using a native FlhD crystal. What is seen in chain A suggests a classic DNA-binding, helix–turn–helix (HTH) motif. FlhD does not bind DNA by itself, so it may be that the DNA-binding HTH motif becomes rigidly defined only when FlhD forms a complex with some other protein, such as FlhC. If this were true, it might explain how FlhD exhibits plasticity in its DNA-binding specificity, as each partner protein with which it forms a complex could allosterically affect the binding specificity of its HTH motif. A disulphide bridge is seen between the unique cysteine residues (Cys-65) of FlhD native homodimers. Alanine substitution at Cys-65 does not affect FlhD transcription activator activity, suggesting that the disulphide bond is not necessary for either dimer stability or this function of FlhD. Electrostatic potential analysis indicates that dimeric FlhD has a negatively charged surface.


In biological systems, structure defines function at many levels and in complex ways. Interactions among macromolecules – proteins with proteins and proteins with DNA – are ubiquitous in a living cell and are responsible for the control of all biochemical processes, such as enzyme catalysis, signal transduction and transcriptional regulation (Harrison, 1991; Jones and Thornton, 1995; 1996; Jones et al., 1999). Macromolecular interactions are complicated, and there is much that we do not know about the rules that govern them. Nevertheless, the study of macromolecular interactions remains a key to the understanding of much that is important in biology.

Proteins FlhD and FlhC in Escherichia coli form a tight complex that is able to recognize and bind DNA with high specificity. FlhD by itself cannot bind DNA. The FlhD/FlhC complex activates the transcription of class II flagellar genes. The physical interaction of the FlhD/FlhC complex and the interaction between this complex and DNA remains obscure and is the subject of our current studies.

The flagella-chemotaxis regulon in E. coli consists of at least 14 operons with more than 50 genes. The expression of these flagellar genes is hierarchically regulated (Macnab, 1996). The flhD master operon, located at the top of this hierarchy, consists of two genes, flhD and flhC. It has been demonstrated that the products of these genes, FlhD and FlhC, form a D2C2 heterotetramer complex that binds to the class II promoter regions of the flagellar regulon, thereby activating transcription (Bartlett et al., 1988; Liu and Matsumura, 1994). The D2C2 complex controls and co-ordinates flagellar and chemotactic gene expression during the cell cycle. Transcription of the flhD master operon peaks at mid-log, whereas the transcription of flhB (level II flagellar gene) peaks at the end of the cell cycle (Prüß and Matsumura, 1997).

FlhC has no known function other than activation of flagellar genes, but FlhD does have other known functions. There is evidence suggesting that this protein is able to change its specificity. For example, it is known that FlhD, but not FlhC, regulates the expression of cadBA, a non-flagellar operon, thereby negatively regulating the cell division rate in E. coli (Prüßet al., 1997). flhD mutants continue dividing as they enter stationary phase, whereas wild-type cells and flhC mutants reduce their cell division rate in the late-exponential phase (Prüß and Matsumura, 1996). It is not known whether expression of cadBA is directly regulated by FlhD alone, by FlhD together with another factor(s) or is regulated indirectly.

Genetic studies of the flhD operon in other species have shown that it participates in the regulation of other systems inside the cell. The flhD operon is transcribed in mid-log phase. Although it is also presumably translated, the FlhD/FlhC activity peaks between late-log and stationary phases (Prüß and Matsumura, 1997). FlhD may be present at mid-log to interact with other factors regulating non-flagellar expression. FlhD is the first known example in prokaryotes of a global regulator protein capable of changing its specificity in order to control non-related targets in a prokaryotic system. This protein is involved in several cellular processes at the transcriptional regulation level (B. M. Prüßet al., submitted).

The flhD gene sequence (Bartlett et al., 1988) and the mass spectrometry experiments (Soutourina et al., 1999) both indicate that FlhD contains 116 amino acid residues. FlhD by itself forms a homodimer, D2.

In order to understand better the mechanism by which FlhD activates the transcription of flagellar genes and to see what features of FlhD impart combinatorial specificity with DNA, we began a crystallographic study of this protein. We previously overexpressed FlhD in E. coli, purified the protein to homogeneity and crystallized it (Campos et al., 1998). We now report the crystal structure of seleno-methionyl FlhD at 1.8 Å resolution, determined by the multiple-wavelength anomalous dispersion (MAD) phasing method (Hendrickson, 1991; Doublie, 1997). Sulphur-methionyl and seleno-methionyl FlhD crystallize in the same crystal form (space group I4, a = b = 88.7 Å, c = 42.4 Å) containing two molecules per asymmetric unit, arranged as a dimer.

The FlhD structure shows a homodimer with a disulphide bridge at cysteine 65 (Cys-65A:Cys-65B, where A and B indicate the monomer of the FlhD dimer to which the residue belongs). The C-terminus of each monomer is a highly flexible domain and presents a putative helix–turn–helix (HTH) motif (Harrison, 1991). Hence, the topology of the homodimer suggests that FlhD is the DNA-binding component of the heterotetrameric complex FlhD/FlhC. Moreover, the crystal structure topology of FlhD suggests that this protein is the DNA-binding component of the heterotetrameric complex FlhD/FlhC, in which FlhD is an allosteric structure with FlhC as the activator.


Crystallization and data collection

Full-length E. coli FlhD was successfully expressed in bacterial strain MC1000 flhD::kan as described in Experimental procedures. Single crystals grew within a week to dimensions as large as 0.20 × 0.11 × 0.15 mm. The crystal form is unusually dense (Westbrook, 2000) with a Matthews number of 1.57 Å/Da and a solvent content of ≈ 22%. Attempts to prepare standard heavy-atom derivatives of these crystals failed, so we resorted to the MAD phasing method (Hendrickson, 1991). We overexpressed, purified and crystallized FlhD in which all methionines were replaced with seleno-methionine (Leahy et al., 1994; Doublie, 1997). Se-Met-FlhD crystals diffracted to a minimum Bragg spacing of 2.8 Å resolution. The crystal space group in both native and seleno-methionyl FlhD (Se-Met-FlhD) crystals is I4 (a = b = 88.7 Å, c = 42.4 Å), and there are two monomers (one dimer) per asymmetric unit. Successful incorporation of seleno-methionine in the protein crystals was demonstrated by amino acid analysis of previously dissolved Se-Met-FlhD crystals (data not shown). The observation of a strong X-ray fluorescence spectrum, characteristic of selenium, from an energy scan of APS beamline 19ID confirmed the presence of selenium in these FlhD crystals and permitted the selection of four energies at which data were to be collected. Diffraction data were collected at four different energies (see Experimental procedures: Data collection). Data collection statistics are shown in Table 1. The data were essentially complete to 2.1 Å resolution and exceed threefold redundancy. At 1.80 Å resolution cut-off, 90% of the data were recorded at all four energies and thus contributed to phasing. The Rmerge values for all data sets were better than 4%, suggesting that the measurements were strong and accurate. The ‘edge’ energy data (that with the most negative f′ values) was used as the reference data for initial phasing. The most complete (97.4%) data set, recorded at 13.1 keV (‘high energy’), was subsequently used for refining the structure.

Table 1. Crystallographic data summaries.
A. Crystal
  • a

    . All data collected to 1.8 Å minimum d-spacing with 90 diffraction images. Each of 1° rotation range, 4 s exposure.

  • b . solve: six sites found, overall figure of merit 67.5%. mlphare: figure of merit 0.72 (centric); 0.57 (acentric); 0.58 (all data).

Space groupI4 (#79)   
Unit cella = b = 88.717 Å; c = 42.423 Å   
Number of peptides/asymmetric unit2   
Matthews number Vm1.57 Å3/Da   
Number of methionines/peptide5   
Number of selenium atoms found6   
B. Data collection (Se-Met-FlhD)a.
Data set nameLow energyEdge energyPeak energyHigh energy
Energy (keV)1212.659512.661313.1
Wavelength (Å)1.03320.979497930.9464
No. of observations43 03647 37746 79750 417
No. of unique data14 01714 80414 79215 124
Complete overall (%)90.295.495.397.4
Complete to 2.1 Å (%)98.699.298.298.8
Rmerge (%)
<I> (arbitrary units)15 13418 32317 43911 067
C. Initial
Phasing powerAcentricCentric  
Low energy2.681.58  
Edge energyUsed as referenceUsed as reference  
Peak energy1.460.96  
High energy2.631.45  

Phasing, initial structure interpretation and refinement

Difference anomalous Patterson maps (Glusker, 1981), calculated with data from the ‘peak’ energy data set, clearly indicated six selenium sites in the asymmetric unit. The four data sets were input to program suite solve (Terwilliger and Berendzen, 1999; URL: http://www.solve., which also identified these sites and developed phases from their anomalous signals. The six sites could be grouped into two sets of three atomic co-ordinates, related by a 180° rotational dyad axis with imperfect (r.m.s. deviation 2.6 Å) superposition. Phases developed in solve (Table 1) were improved by refinement and density modification, yielding an excellent electron density map with high contrast and clearly interpretable protein structural features (Fig. 1).

Figure 1.

Experimental MAD electron density at 1.8 Å resolution for FlhD structure in the area of the anti-parallel α-helices 1 in chain A (red) and chain B (blue). The map contour level is at 1σ. The high quality of the electron density map shown is representative of that part of the FlhD dimer structure that was interpretable. Unless otherwise indicated, all figures were generated with Swiss-pdb viewer v3.6b3 (Guex and Peitsch, 1997; software available from URL: and rendered with povray 3.1 (software available from URL:

Electron densities for amino acid side-chains of residues His-2A to Ser-82A in chain A and residues Ser-4B to Ser-82B in chain B were in agreement with the amino acid sequence derived from the E. coli FlhD gene (Bartlett et al., 1988). Thus, these segments of the dimeric structure were immediately interpreted. The co-ordinates of selenium atoms in seleno-methionine Se-met-33, Se-met-42 and Se-met-54 in both chains, identified as such in the electron density map interpretation, were consistent with the six selenium sites used for phasing in solve. In contrast, neither Se-met-1 nor Se-met-95 of either chain could be identified in the initial map, and no feature in the anomalous difference Fourier map could be assigned to any of these residues.

As stated previously, the relationship between the two monomeric chains of the dimer was easily seen after initial electron density interpretation as an exact but imperfect 180° rotational dyad. The unique cysteines of each chain (Cys-65A, Cys-65B) were found to be immediately adjacent, and strong electron density between them clearly implied that they were linked covalently by a disulphide. This had not been anticipated.

In the initial interpretation, Met-1B and His-2B could not be modelled. Met-1A and Thr-3B were poorly defined (Fig. 1), but models of these two residues were built and refined. Residues 83–116 in both chains were not interpretable and were not initially modelled.

The structural model of the FlhD dimer was refined by simulated annealing with program cns (Brünger et al., 1998). Serial stages of refinement followed by molecular modelling permitted us to fit additional residues in chain A, from Arg-83A to Thr-98A. The structural model of the B chain cannot be built beyond residue Ser-82B because, at that point in the three-dimensional model of the crystal structure, the chain runs directly into the N-terminus of an adjacent FlhD molecule in a neighbouring asymmetric unit. Thr-98A is quite close to the crystallographic fourfold axis in the crystal structure, so that four symmetry-related chains A sterically interfere with each other at this point, and the electron density of chain A becomes uninterpretable. Thus, the unusually tight packing of protein in this crystal form prevents both C-termini from adopting a well-defined structure. The implication is that the C-terminus is very flexible: sufficiently flexible that the forces involved in crystal packing are greater than the forces folding this region of the FlhD protein.

Table 2 summarizes the present status of refinement for the FlhD molecular model. All data between 10.0 and 1.80 Å resolution were used for the refinement in cns. A total of 1420 non-hydrogen atoms corresponding to residues Met-1 to Thr-98 in chain A and residues Thr-3 to Ser-82 in chain B have been modelled, as have 117 solvent molecules. The constraints for modelling were very tight, resulting in very small deviations from ideal bond lengths and bond angles. The small (8.1%) difference between crystallographic R-factor and free R-factor suggests that relatively little bias is present in the refinement. Validation tests in procheck (Laskowski et al., 1993) show that the refined model is well within normal limits for a structure refined at this resolution (Table 2). No phi–psi angles are in the disallowed regions of the Ramachandran map, and 93% are in the ‘most favoured’ regions. By all criteria of procheck, this model is conservatively and accurately defined. But, apparently because a relatively large amount of the structure cannot be modelled, the crystallographic R-factor cannot be reduced below 20%. The temperature (‘B’) factors are reasonable, with low (10–25 Å2) values in the interior and high (60–80 Å2) values at either end of each peptide chain.

Table 2. Crystallographic refinement statistics.
 OverallOuter shell
  • a

    . 641 reflections.

  • b

    . 178 residues.

  • c

    . Protein atoms.

  • d

    . Solvent atoms.

Resolution10.0–1.8 Å1.88–1.80 Å
Reflexions15 124 
Data completeness (%)85.953.7
σ cut-offNone 
R-value (%)21.829.9
Free R-value (%)29.9a31.8a
No. of non-hydrogen atoms:1420b 
No. of solvent molecules117 
Mean bond length deviation0.0084 Å 
Mean bond angle deviation1.169° 
Mean B-factor35.1 Å2c 
46.3 Å2d 
Residues in core phi–psi regions (%)92.9 
Residues in disallowed regions (%)0.0 

Description of the FlhD molecular structure

Our molecular model for FlhD is a tightly associated dimer. The core of the FlhD dimer is an ellipsoidal structure of dimensions ≈ 24 Å × 30 Å × 39 Å. Protruding from the core are two flexible arms, each composed of ≈ 34 residues of the C-terminus of each monomer (Figs 2 and 4). Residues 83–98 of only one of these arms (in chain A) can be modelled.

Figure 2.

Stereo schematic diagram of the E. coli FlhD dimer as a ribbon. A and B monomers are colour coded as follows. Chain A: α-helices, orange; β-strand, cyan. Chain B: α-helices, green; β-strand, red. Secondary structural elements (Table 3) are labelled: α, α-helix; β, β-strand. A and B indicate the monomer of the FlhD dimer to which the element belongs. The Cys-65A:Cys-65B disulphide bridge is shown as a stick representation. The putative HTH motif is located from α-helix 6 (α6A) and α-helix 7 (α7A). The position of each monomer is indicated: Met-1A and Thr-98A, chain A; Thr-3A and Ser-82B, chain B.

Figure 4.

Electrostatic surface potential for the monomer A (A–C) and dimer (D–F). Regions of negative potential are coloured red; positive, blue; and neutral, white. Bottom view, A and D; front view, B and E; and top view, C and F. The electrostatic potential was computed on the molecular surface using the Coulomb interaction method supplied with the Swiss-pdb viewer v3.6b3 software (Guex and Peitsch, 1997). The calculations were applied to the charged residues. Monomer B is shown as a ribbon diagram in (A–C). Several hydrophobic surfaces, accessible in the monomeric structure, are shielded from solvent in the dimeric structure. The homodimer contact surface area is large and non-planar. Much of the surface area of the dimer (D–F) is negatively charged (bright red). The most intensely negative surface area is located at the top of the molecule (D), as a result of the presence of several acidic residues. Small patches of positive charge are located at the bottom and at each side of the molecule (D and E). It should be noted that the first two residues of chain B are absent from the model (see text). If they were present, they would cover the area in the inferior part of (D), as residues Met-1A and His-2A cover a similar region in the upper part of (D).

The sequence and the extent of the secondary structure elements observed in the monomers are listed in Table 3. Overall, 68% of the model structure is α-helical; only 3.4% of the observed residues (three residues/protomer) are in β-strands. The structure has no classic β-turns (Richardson, 1981). The remaining 30% of the model does not fall within a secondary structural classification. FlhD is classified as an α-domain structure (Tables 3 and 4; Fig. 2) (Murzin and Finkelstein, 1988; Michie et al., 1996). It contains seven α-helices in each monomer, the longest (α1) comprising 24 residues. The other helices are each seven to nine residues long. The long helices α1 from either monomer are anti-parallel and contact at a +20° angle, consistent with standard models for helical packing (Chothia et al., 1981; Richardson, 1981). Otherwise, the topology of FlhD is not similar to other known proteins (Murzin and Finkelstein, 1988; Michie et al., 1996).

Table 3. FlhD secondary structure.
  • a

    . Percentage was obtained from the observed residues.

α1(Ser-4A to Gln-27A)(Leu-6B to Gln-27B)
α2(Lys-29A to Leu-36A)(Lys-29B to Leu-36B)
α3(Glu-40A to Ala-47A)(Glu-40B to Ala-47B)
α4(Leu-51A to Ala-58A)(Leu-51B to Ala-58B)
α5(His-72A to Leu-78A)(His-72B to Leu-78B)
α6(Arg-83A to His-91A)(Not observed)
α7(Ile-94A to Ser-97A)(Not observed)
β1(Cys-65A to Phe-67A)(Cys-65B to Phe-67B)
% α-helix121 residues (68.8%)a 
% β-strandSix residues (3.5%)a 
Table 4. Characterization of FlhD dimer
Protein interface parameterValue FlhDAHomodimersa
  • Chain A interfaced with chain B (C-terminal residues 82–98 were not included in the analysis).

  • a . From 32 pdb file entries ( Jones and Thornton, 1996); monomers range from 10 000 to 50 000; standard deviations in parentheses.

  • b

    . Interface accessible surface area.

  • c

    . Root mean square deviation (r.m.s.d) from best plane through interface. The larger the r.m.s. value, the less planar the interface.

  • d

    . Length of the first principal axis of the least-squares plane through the atoms in the interface.

  • e

    . Length of the second principal axis of the least-squares plane through the atoms in the interface.

  • f

    . The smaller the ratio, the more extended the shape of the interface. A ratio of 1.0 indicates that an interface is approximately circular.

  • g

    . Gap volume between the two components of the complex. The gap volume gives a measure of the complementarity of the interacting surface.

  • h

    . Gap volume/interface ASA.

  • i

    . Alpha > 20% and beta < 20%.

  • j . 0.50 hydrogen bonds per 100  Å 2 monomer ASA.

  • k . 0.70 hydrogen bonds per 100  Å 2 monomer ASA.

▵ASAb2)2176.211685.03 (1101.09)
% ASA33.88
Planarityc (Å)5.023.46 (1.72)
Length/breadth ratio (circularity)f0.710.71 (0.17)
Interface residue segments25.22 (2.55)
Gap volume (Å3)g3726.65
Gap volume indexh1.7122.20 (0.87)
% polar atoms in interface37.08
% non-polar atoms in interface62.9
Secondary structureAlphai
Hydrogen bonds11j11.8 (5.1)k
Salt bridges0
Disulphide bonds1
Bridging water molecules8

The N-terminal residues of each chain are flexible, exhibiting large temperature factors (exceeding 60 Å2). Residues Met-1B and His-2B are so disordered that they cannot be fitted to any observed electron density and, thus, are not presently modelled. The electron density is very clear for the remaining polypeptide chains up to residue 82 in each chain. Residues of chain B beyond 82 are impossible to model, because the structure is disrupted by an intervening FlhD peptide in an adjacent asymmetric unit of the crystal, preventing the C-terminus of chain B from adopting any defined structure. Residues 83–98 of chain A can be modelled, but the temperature factors of all these residues exceed 60 Å2 and, in most instances, exceed 70 Å2. Beyond residue Thr-98A, the A-chain approaches the fourfold crystallographic axis so closely that it cannot retain a defined structure and thus is not modelled. Thus, our model suggests that the entire C-terminal sequence (83–116) of each chain is mobile and flexible in the FlhD dimer, so much so that its folding energy is less than the energy necessary to crystallize the protein. This is quite an unusual situation.

Dimeric interface in FlhD

FlhD is a compact homodimer. The greater part of the dimeric contact region of FlhD occurs in its N-terminal half, up to and including the single short β-strand (residues 65–67). The core of the FlhD structure is made up of these N-terminal residues. The flexible C-termini (residues 83–116) extend out from this core (see below). In order to understand the monomer–monomer interaction in FlhD, we performed an analysis of the dimer interface with the software package ‘protein protein interaction server’ (URL: (Jones and Thornton, 1995, 1996). As the C-terminus of FlhD is flexible and its structure disordered, we performed the analysis only with residues 1–82 of chain A when interfaced with chain B (3–82). The results of the analysis are shown in Table 4.

In the crystal structure, each monomer intertwines the other, and there is an extensive contact area, burying 2176 Å2 of surface area. This interface accessible surface area (ASA) is significantly greater than the mean of 32 other, non-homologous homodimers, suggesting that FlhD is a very compact dimer (Table 4). Most (84%) protein–protein interfaces in dimeric proteins are flat (Argos, 1988; Jones and Thornton, 1995), but the interface in the FlhD dimer is distinctly non-planar (Figs 2 and 4). This situation is reminiscent in some respects of the dimeric interaction of trp repressor (Lawson et al., 1988).

The contacts between monomers involve a large portion of each polypeptide. Helix α1 and strand β1 both contact their respective helix α1 and strand β1 on the other peptide; no other secondary structure does that. All five helices in the core dimer contact the other polypeptide. Interface contacts feature a mixture of hydrophilic and hydrophobic interactions (Table 4 and Fig. 4A–C). Polar interactions are particularly important for specificity in associations between monomers. The FlhD interface consists of 37% polar and 63% non-polar atoms. Such interface compositions are characteristic of facultative dimers (Jones and Thornton, 1996). The most important elements of the dimer contact region are found in four helices and two β-strands: a cluster formed by helices α1 and α2 and strand β1 from each monomer.

Although the surface interactions between monomers are predominantly hydrophobic contacts, the polar interactions include 11 direct hydrogen bonds and eight bridging water molecules. There is also one disulphide covalent bond (Cys-65A:Cys-65B). The number of hydrogen bonds per ASA in FlhD is slightly lower than the mean (Table 4). It has been observed that homodimers and non-transient heterocomplexes (those with components that only occur as complexes) typically contain relatively few intermolecular hydrogen bonds per 100 Å of ASA. Hydrogen bonding is more frequent in complexes in which the subunits can be separated without denaturation (Jones and Thornton, 1995; 1996). In fact, homodimers rarely occur or function as monomers (Jones and Thornton, 1996). This finding therefore suggests that FlhD is a non-transient dimer.

The association in the FlhD homodimer is most probably a specific one, because it displays characteristics of known protein–protein interfaces (Table 4). The intimacy of the contacts and the complementarity of its participating surfaces suggest that FlhD dimerization is tight and specific. The interface surface of chain A of FlhD has a comparatively high deviation from planarity (Table 4) and, in this dimer, the two monomers are twisted together across the interface (Figs 2 and 4A–C) (Jones and Thornton, 1996).

It has been found that the shapes of interface regions (circularity) vary little among homodimers, antigens and the enzyme component of enzyme–inhibitor complexes (Jones and Thornton, 1996). Circularity in the FlhD dimer is slightly lower than the mean obtained for other homodimers and is consistent with the fact that the interface surface between each monomer is complex (Table 4). The gap index obtained for FlhD indicates that the interacting surfaces in the FlhD homodimer are complementary (Table 4). Homodimers, enzyme–inhibitor complexes and permanent heterocomplexes are the most complementary, whereas other kinds of complexes, such as non-obligatory heterocomplexes, are the least complementary (see Table 2 from Jones and Thornton, 1996). The high complementarity between FlhD monomers again suggests that FlhD is an obligatory dimer.

The C-terminus of FlhD is a putative HTH motif

Primary structure analysis and the refined crystallographic model of residues 83–98 of chain A suggest that residues 85–104 may adopt a HTH structural motif in the D2C2 tetramer. The HTH motif is defined as a 20-residue segment with two α-helices that cross at an angle of about 120° (Harrison, 1991). The most highly conserved residue in the HTH motif is a glycine located in the turn (Pabo and Sauer, 1992). FlhD residue 93 is a glycine, and it is located precisely between helices α6 and α7. The fifth residue from the glycine in an HTH motif is often threonine, and FlhD residue 98 is a threonine.

The distance between Gly-93A and the FlhD dyad axis is 17 Å in our crystallographic model. Therefore, the distance between the two glycine residues in FlhD would be 34 Å: the distance between two adjacent major grooves of B-form DNA. Furthermore, if one superimposes the FlhD dimer dyad axis on the dyad axis of trp repressor (Lawson et al., 1988), helix α6 superimposes precisely upon helix α4 of trp repressor: the first helix of its HTH motif. Helix α7 is not in the canonical orientation and position of an HTH motif but, as crystal packing has disrupted this helix, its crystallographically defined structure is unlikely to represent its structure in solution.

In view of the observed flexibility of this region of the structure, it is interesting to conjecture whether FlhC binding might force this C-terminal region of FlhD into the appropriate HTH structure necessary for DNA binding. Residues from the observed region in the C-terminus (Arg-83 to Thr-98) of chain A have high temperature factors (exceeding 60 Å2). Also, residues 83–116 in chain B and residues 99–116 in chain A are disrupted by crystal packing and cannot be modelled. Thus, this part of the FlhD dimer must be highly flexible in solution and, therefore, its structure should be malleable.

It has been observed that transcription factors usually present two binding heads and that, in order to match the twofold rotational symmetry of DNA, they are usually dimeric (Jones et al., 1999). In fact, most prokaryotic transcriptional regulatory proteins are dimers with symmetry-related HTH motifs (Harrison, 1991). All these points taken together strongly suggest that the C-terminus of FlhD could be the active part of the FlhD/FlhC D2C2 tetrameric complex that interacts specifically with DNA of the class II promoters of flagellar genes. Alanine scanning mutagenesis also supports this hypothesis (Campos and Matsumura, 2001). However, we cannot exclude the possibility that portions of the FlhC structure might also interact with DNA in the complex.

The FlhD homodimer possesses an interchain disulphide bridge

The FlhD homodimer has a single interchain disulphide bond between the cysteines 65 (Cys-65A:Cys-65B). Cys-65 lies within the single short β-strand of each monomer, and these two strands together form the only β-sheet structure in the FlhD dimer. The distance between the two sulphur atoms is 2.04 Å, consistent with a covalent bond, and the distance between Cα atoms of the pair is 5.44 Å, typical of disulphide bonds. The disulphide is located in the middle of the homodimer core, inaccessible to solvent (Figs 2 and 3). During purification of the FlhD protein, special care was taken to avoid oxidizing the seleno-methionine using degassed solutions and keeping the FlhD solution under an argon gas atmosphere. However, no exogenous reducing agent, such as dithiothreitol (DTT), was used when purifying or otherwise handling FlhD in solution.

Figure 3.

Detailed view (stereodiagram) of the relative position of the Cys-65A:Cys-65B disulphide bridge between chain A and chain B of the FlhD dimer. It is clear that the sulphur atoms (yellow) of both cysteine residues are located deep within the dimer core. Top view (A), bottom view (B). Chains are colour coded as follows: red, chain A; blue, chain B.

Disulphide bridges are not common in cytoplasmic proteins. The interior E. coli environment is generally considered to be reducing (Derman et al., 1993). Very few proteins within E. coli possess disulphide bonds, and those bonds that have been found are usually related to intrinsic properties of these proteins (Bessette et al., 1999). The presence of a disulphide bridge in a global regulator such as the FlhD dimer forces us to consider whether this bond may have specific functions when FlhD regulates the transcription of flagellar or other genes. Therefore, we wanted to know whether the Cys-65A:Cys-65B disulphide bond is necessary for stabilization of the homodimer or for the function of FlhD in any way. To test this possibility, we mutated residue Cys-65 to alanine. Mutagenesis of flhD involved the plasmid pXL27 carrying both genes flhD and flhC. The flhDC65A mutant plasmid (pACC65A) was obtained and transformed into the strain YK4131 (flhD) in order to confirm its phenotype in swarming plates. As FlhD, together with FlhC, is required for the transcription of the class II flagellar genes, the functionality of FlhD was evaluated by its ability to complement the flhD strain. When tested on swarming plates, FlhDC65A was able to complement the strain YK4131 (flhD), demonstrating that the C65A mutation neither negatively affects nor decreases the expression of flagellar genes by FlhD (data not shown). This result demonstrates that the disulphide bridge is not required for either flagellar expression regulation by FlhD or (by implication) for FlhD structural stability.

When intersubunit disulphides do occur in proteins, they often play an important role in structural stabilization. However, it should be noted that, in the FlhD system, our observations suggest that the stability of the homodimer is not affected by the removal of the disulphide bond. Remember that the dimer interface is extensive, twisting together the monomers that share a large area of contact. Hydrophobic forces and hydrogen bonds provide sufficient binding stability to the dimer that the covalent binding force of the disulphide link is not necessary to stabilize the dyad (Fig. 4A–C). Disulphide bridges are common in extracellular proteins and are frequently required to confer structural stability to oligomers. Disulphide bonds in cytoplasmic proteins are usually involved in catalytic functions (Bessette et al., 1999). At this point therefore we do not know whether the FlhD disulphide bridge has an intracellular function or whether FlhD also acts as an extracellular protein.

Electrostatic potential surface of the FlhD homodimer

Both polar and non-polar interactions are seen at the FlhD homodimer interface. To improve our understanding of the force and specificity of interactions that govern the formation of the dimer, we calculated the electrostatic potential surface of chain A and of the whole dimer (Fig. 4).

The hydrophobic surfaces of each monomer are mostly covered once the FlhD dimer is formed, thus avoiding non-specific aggregation that can occur with proteins marked by large hydrophobic surfaces. After dimerization, most of the FlhD dimer surface is negatively charged. This negative charge is more prominent on the protein surface opposite the putative DNA-binding face of FlhD (Fig. 4F). Small patches of positive and neutral charges are found at the bottom and at each side of the core homodimer (Fig. 4D and E). These patches may serve as specific recognition markers in protein–protein and protein–DNA interactions in which FlhD participates.

The extensive negative surface potential of the FlhD dimer surface is consistent with the observation that this cytoplasmic protein is highly soluble. Indeed, it is possible to obtain a concentration of up to 100 mg ml−1 pure FlhD in vitro with no apparent precipitation.


FlhD is a putative global regulator in the prokaryotic cell (B. M. Prüßet al., personal communication), playing a critical role in controlling the transcription of the flagellum hierarchy (Liu and Matsumura, 1994) and in the inhibition of cell division (Prüß and Matsumura, 1996). Therefore, it is very important to understand its molecular structure. We have now determined its structure crystallographically.

The FlhD crystal structure could only be solved by MAD phasing. This crystal form is very dense, preventing the formation of the usual heavy-atom isomorphous derivatives by soaking methods. Intensive efforts to find other crystal forms of this protein were made and were unsuccessful.

The core of FlhD is predominantly helical. Cursory inspection of the databases does not reveal any other protein of known structure with a folding topology similar to FlhD. The core is ≈ 24 Å × 30 Å × 39 Å and is compact. Other than its six helices, the only other secondary structural feature is one three-residue β-strand in each monomer, which complement each other in the dimer to form a small anti-parallel sheet. There are no well-defined reverse turns.

The crystallographic results reported here show that FlhD is dimeric, consistent with previous observations (Liu and Matsumura, 1994; Campos et al., 1998). Topological analysis shows that the interface between FlhD monomers is intimate and highly complementary. The interface is not planar, but rather is twisted and complex. It is a predominantly non-polar interface, and dimerization occludes a large surface area. Taken as a whole, the entire character of the dimeric interface strongly suggests that FlhD is an obligatory dimer.

The C-terminus of each monomer is poorly observed in the crystal because of obstructions from the very tight crystal packing that interrupt the fold of these C-termini. The A-chain can be traced to residue 98; the B-chain only to residue 82. The last few observed residues in each C-terminus exhibit very high temperature factors, suggesting that their folded structures are flexible. Crystal packing forces are very weak compared with the potential energy embodied in most protein folding. The fact that these C-termini can be disrupted so easily implies that their folded structures are not stable.

We have found portions of a putative HTH motif in the C-terminus of FlhD, suggesting that this region of the protein may be the DNA-binding portion of the FlhC/FlhD D2C2 tetramer. Flexible domains are common in transcriptional regulators (Harrison, 1991). What we see of the C-termini of the dimer suggests that each is a flexible arm protruding out of the main FlhD dimer core. The distance between the two reading heads of these putative HTH motifs in the FlhD dimer correlates with other DNA-binding regulators and is commensurate with the distance between two consecutive major grooves of DNA. Consequently, it is possible that the FlhC protein confers rigidity and specificity to the C-terminal HTH motifs in FlhD, in the D2C2 tetramer. If this hypothesis is correct, it may also be true that the detailed conformation of the putative reading heads varies depending on the partner to which FlhD interacts, acquiring different binding modes for different promoters in the cell. This might explain how FlhD appears to exhibit plasticity in DNA-binding specificity for different regulatory activities.

Structural flexibility of DNA reading heads is seen in other prokaryotic transcriptional regulators, such as the trp repressor. It has been suggested that mobility in the flexible domains of the trp repressor may be necessary for the different binding modes that are required for its variable interactions with three different operator sites (Lawson et al., 1988). Analogously, it may be that the FlhD dimer is the protein of the FlhD/FlhC complex that participates in the binding of the promoter region of the class II flagellar genes. Alanine scanning mutagenesis conducted on the surface of this protein strongly supports this hypothesis (Campos and Matsumura, 2001). The HTH motif cannot fold or function by itself, but always occurs as part of a larger DNA-binding domain (Pabo and Sauer, 1992). Thus, we cannot discard the possibility that FlhC also participates in DNA recognition and/or binding.

The electrostatic potential surface of the FlhD dimer (Fig. 4) is strongly negative. The negative charge is more evident on the face opposite the putative HTH DNA-binding motifs. There are some neutral and positively charged patches at the side and at the bottom of the core dimer. These patches may impart binding specificity with regard to the interaction of FlhD with DNA, FlhC and other presumptive partners of FlhD. Alanine scanning mutagenesis has provided some information about the active sites of this dimeric protein, and the results correlate with this hypothesis (Campos and Matsumura, 2001).

The highly polar nature of the dimer surface makes FlhD very soluble as a dimer. High solubility may help the dimer to maintain itself in the cytoplasm before it interacts with other presumptive factors and before it activates the transcription of the flagellar genes in concert with FlhC. This is particularly important because FlhD and FlhC are expressed in the mid-log phase, whereas class II flagellar genes are expressed late at the end of the life cycle (Prüß and Matsumura, 1997).

We have found a disulphide bond between FlhD monomers (Cys-65A:Cys-65B). Given that FlhD is a soluble cytoplasmic protein and that E. coli has a reducing cytoplasmic environment, the discovery of this disulphide bond in the FlhD dimer was surprising. Disulphide bonds are usually associated with structure stabilization in extracellular proteins. However, transient disulphide bonds that are not required for structural stability have been detected in a few cytoplasmic proteins, although they are not common. Disulphides have been seen in cells defective for certain components of reducing pathways (Derman et al., 1993; Prinz et al., 1997), but that is not the case here. The only known disulphide bonds routinely observed in cytoplasmic proteins are formed in enzymes during their catalytic cycles or under oxidative stress in the cell (Stewart et al., 1998; Åslund et al., 1999; Jakob et al., 1999; Kang et al., 1999). When Cys-65 is changed to alanine, it does not affect the ability of FlhD to regulate the transcription of class II flagellar genes in E. coli. It is possible that the disulphide bond is required for another unknown function of FlhD. Perhaps FlhD might be induced in an enzymatic or in an oxidative response to some stimuli we have not yet observed.

Recently, FlhD sequences from other bacteria have been characterized. Whereas FlhD is highly conserved among these species (Campos and Matsumura, 2001), we believe that the three-dimensional structure reported here should be very similar in all of them. Thus, the study of FlhD may help us to understand unique aspects of the regulation of the flagellar operon in each species and to understand how this protein controls other systems. Particularly interesting is the control of the expression of virulence factors in those species that are pathogens.

The combinatorial specificity makes FlhD a versatile and appropriate protein not only as a model for the study of protein–protein interactions and protein–DNA interactions studies, but also as a model for the study of the transcriptional activation by complex formation in a prokaryotic system.

The structure not only gave information about the three-dimensional characteristics of this global regulator, but also suggested that FlhD is the factor of the complex FlhD/FlhC that possesses a DNA-binding capacity. This assumption agrees with the observation that FlhD is a global regulator capable of changing its specificity to regulate different targets in the cell.

Experimental procedures

Bacterial strains, plasmids and media

E. coli strains MC1000 flhD::kan and B834(DE3) [FompT hsdSB (rBmB) gal dcm met (DE3)] were used to express the FlhD and Se-Met-FlhD proteins respectively. Strain Epicuran Coli XL1-Blue {recA1 endA1 gyrA96 thi-1 hsdR17 supE44 relA1 lac[F′proAB lacIqZΔM15 Tn10 (Tetr)]} (Stratagene) was used to propagate the plasmid generated by in vitro site-directed mutagenesis. Strain YK4131 (flhD derivative of YK410 [F, araD139, Δlac(U169), rpsL, thi, pyrC46, nalA, thyA, his]) was used to examine the phenotype of the Cys65Ala mutant in swarming plates.

Luria–Bertani broth (Miller, 1992) was used for all purposes other than Se-Met-FlhD protein expression and phenotypic analysis. The medium for Se-Met-FlhD expression was 2 × M9 minimal salts enriched with 0.4% glucose and supplemented with 2 mM MgSO4, FeSO4.7H2O (25 µg ml−1), riboflavin (1 µg ml−1), niacinamide (1 µg ml−1), pyridoxine monohydrochloride (1 µg ml−1) and thiamine (1 µg ml−1). All amino acids with seleno-methionine (Calbiochem) instead of methionine were added at a concentration of 40 µg ml−1 (Leahy et al., 1994). Antibiotics (Sigma) were added when required at the following concentrations: penicillin G, 100 µg ml−1; kanamycin, 30 µg ml−1; chloramphenicol, 25 µg ml−1. Phenotype assays were performed on tryptone soft agar plates (1% tryptone, 0.5% NaCl and 0.3% bacto-agar).

Protein purification and crystallization

Both native and seleno-methionyl FlhD were overproduced using the two pT7 systems (Tabor, 1990). Native FlhD was overexpressed in MC1000flhD::kan harbouring plasmids pGP1-2cml and pXL25 as described previously (Liu and Matsumura, 1994). pXL25 contains the complete coding sequence of FlhD. The methods used to express, purify and crystallize the native FlhD protein have been described previously (Liu and Matsumura, 1994; Campos et al., 1998). The Se-Met-FlhD protein was overexpressed in the auxotrophic E. coli strain B834(DE3) harbouring the plasmid pXL25. Cells were grown in 2× M9 minimal medium with supplements. flhD expression was induced by adding 0.4 mM IPTG as soon as the cells approached mid-exponential phase. After 6 h of incubation at 30°C, Se-Met-FlhD was purified from lysate to homogeneity according to the same protocol used for the native FlhD (Campos et al., 1998). In order to avoid possible oxidation of seleno-methionine, degassed solutions were used during all the purification steps of the Se-Met-FlhD protein. Each batch of pure native and Se-Met-FlhD protein was brought to a final concentration around 40 mg ml−1 using Centriprep-3 and Centricon-3 centrifugal concentrators (Amicon) in 50 mM Tris (pH 7.9). Soluble native and Se-Met-FlhD were stored at 4°C. Effective incorporation of seleno-methionine in FlhD was confirmed by amino acid analysis (Ausubel et al., 1989).

Native and Se-Met-FlhD tetragonal crystals used for structural analysis grew within 3–8 days with microseeding techniques. They were grown by the hanging-drop vapour diffusion method. Each drop was made by mixing 2 µl of protein stock solution with 2 µl of reservoir solution containing 20–30% PEG 5000, 0.05–0.2 M sodium acetate and 0.1 M Tris-HCl, pH 8.5. The Se-Met-FlhD protein was crystallized under the same conditions with similar results to native FlhD at 20°C (Campos et al., 1998).

Data collection

Owing to radiation sensitivity of the crystals, data collection from seleno-methionyl FlhD crystals had to be carried out at cryogenic temperatures (≈100K). To prepare each crystal for data collection, it was mounted in a nylon loop directly from the ‘mother liquor’ from which it was grown and flash frozen in liquid nitrogen (Rogers, 1997).

An initial data set was collected on an RU200 Rigaku rotating-anode X-ray source, with a Bruker X-100 multiwire detector system (Durbin et al., 1986), to characterize the crystal and plan for MAD phasing data collection at the synchrotron. These data were processed by xengen (Howard et al., 1987).

MAD data were collected from a single Se-et-FlhD crystal on beamline 19ID at the Advanced Photon Source in Argonne National Laboratory (Westbrook and Rosenbaum, 1997). All crystallographic diffraction data were obtained in about 2 h of elapsed time (actual data collected in 36 min). All data were recorded on a large-format, modular CCD detector (Westbrook and Naday, 1997) and processed out to 1.8 Å resolution with program d*trek (Pflugrath, 1997). The four X-ray energies at which data were collected were identified with an X-ray fluorescence scan near the nominal K-absorption edge of selenium of 12.658 keV. The following energies were selected: remote energy, 12.0000 keV (λ = 1.0332 Å); inflection energy, 12.6595 keV (λ = 0.97934 Å); peak energy, 12.6613 keV (λ = 0.9792 Å); and remote energy, 13.10 keV (λ = 0.94641 Å).

Structure determination and refinement

A Patterson map (Glusker, 1981) was calculated with anomalous difference amplitudes from the peak energy data set:

inline image

where inline image, called the anomalous difference amplitude, is the difference in amplitude between anomalous pairs of Bragg reflections indexed as (h, k, l) and (–h, –k, –l). The Patterson map clearly identified six selenium sites within the crystallographic asymmetric unit.

The four data sets (low, inflection, peak and high energies) were input to the program solve (Terwilliger and Berendzen, 1999; URL:, which corroborated the selenium atom sites first identified in the Patterson map analysis and developed the phases of all Bragg reflections using an automated statistical procedure. The phases calculated by solve were improved through three rounds of refinement by CCP4 program mlphare, followed by CCP4 program dm (CCP4, 1994; Cowtan, 1999). The entire phasing development process closely follows that described by Ramakrishnan and Biou (1997).

The electron density map of the FlhD crystal was initially displayed on an Evans and Sutherland ESV10 graphics workstation with frodo (Jones, 1985). All molecular model building was carried out either with frodo on the ESV10 or with the Swiss-pdb viewer (Guex and Peitsch, 1997; URL: on either an IBM personal computer or a Macintosh G4.

Structure refinement of the molecular model was carried out on a Silicon Graphics R10000 workstation running cns (Brünger et al., 1998) using torsion-angle molecular dynamics and combined simulated annealing/maximum-likelihood model refinement. Solvent molecules were identified with the program waterpick within the cns package. Validation of the molecular model was carried out with program procheck (Laskowski et al., 1993). Data for refinement were derived from the ‘high energy’ data set collected as part of the MAD set. All data between resolution limits 10.0 Å and 1.80 Å (15, 124) were used without any sigma cut-off. Four per cent of all diffraction data were selected at random for calculation of the free-R cross-validation test and were not used in the actual structure refinement.


Co-ordinates for the structure have been deposited with accession number 168E in the Protein Data Bank.

Site-directed mutagenesis

In vitro site-directed mutagenesis of flhD was performed in accordance with the QuikChange site-directed mutagenesis kit (Stratagene) protocol. Plasmid pXL27, which carries the complete flhd and flhC genes, was used for each flhD mutagenesis experiment according to the instructions of the manufacturer. Changes were introduced through individual and internal mutagenic oligonucleotide primers complementary to flhD except at the position of the desired mutation. The changes were located in the middle of the primer with ≈ 14–16 bases of correct sequences on both sides. Oligonucleotides FlhDC65A_a (5′-CCAATCAACTGGTTGCTCACTTCCGTTTTGAC-3′) and FlhDC65A_b (5′-GTCAAAACGGAAGTGAGCAACCAGTTGATTGG-3′) were used as primers. Bases in bold and underlined indicate the changes introduced in the original gene sequence of flhD to obtain an alanine codifying codon.

In vitro site-directed mutagenesis reaction was performed using 50 ng of pXL27 as template. Twenty cycles were completed with the following parameters: 95°C for 30 s, 55°C for 1 min and 70°C for 8 min. The plasmid carrying the flhDC65A mutant was transformed into Epicuran Coli XL1-Blue competent cells after 1 h of digestion with DpnI. The flhD mutant plasmid (pACC65A) was then purified using the QIAprep spin miniprep kit (Qiagen), sequenced (Sanger et al., 1997) using the CircumVent thermal cycle dideoxy DNA sequencing kit (New England Biolabs) and transformed into the flhD strain YK4131. In order to verify the changes, the whole sequence of the flhD mutant gene was obtained. The upstream region from the pT7 to the start codon of flhD was also sequenced to exclude any random change that possibly affected the expression level of the flhD gene. Phenotypic analyses were carried out in swarming plates at 30°C in a humid box for 5–6 h.


The authors thank Azucena Rosas for her help in the characterization of the flhDC65A mutant. We also acknowledge Peggy O'Neill for critical reading of the manuscript. This work was supported by National Institutes of Health grant GM59484 to P.M. and by the US Department of Energy, Office of Biological and Environmental Research, under contract W31-109-ENG-38 to E.M.W. Beamline 19ID of the Advanced Photon Source is operated by Argonne National Laboratory's Structural Biology Center through support from the US Department of Energy, Office of Biological and Environmental Research, under contract W31-109-ENG-38.