Synechocystis PCC6803 possesses several open reading frames encoding putative WD-repeat proteins. One, the Hat protein, is involved in the control of a high-affinity transport system for inorganic carbon that is active when the cells are grown under a limiting concentration of this carbon substrate. The protein is composed of two major domains separated by a hydrophobic linker region of 20 amino acid residues. The N-terminal domain of Hat has no homolog in standard databases and does not display any particular structural features. Eleven WD repeats have been identified in the C-terminal moiety. The region encompassing the four terminal WD repeats is essential for growth under a limiting inorganic carbon regime. The region encompassing the two most terminal WD repeats is required for the activity of the high-affinity transport system. However, because the Hat protein is located in the thylakoids, it should not be itself an element of the transport system. The structural organization of the WD-containing domain of Hat was modeled from the crystal structure of the G protein β subunit (with seven WD repeats) and of hemopexin (a structural analog with four blades). Functional and structural data argue in favor of an organization of the Hat WD moiety in two subdomains of seven and four WD repeats. The C-terminal 4-mer subdomain might interact with another, yet unknown, protein/peptide. This interaction could be essential in modulating the stability of the 4-mer structure and, thus, the accessibility of this subdomain, or at least of the region encompassing the last two WD repeats.
More than 150 WD-repeat proteins have been identified, most of which have been found in eukaryotic organisms (Neer et al. 1994; Neer and Smith 1996; Garcia-Higuera et al. 1998, Smith et al. 1999). Their function often is not well-defined, although most of them are involved in the regulation of various metabolic processes. Their mechanism of action presumably involves interaction with other protein(s), but no associated enzymatic activity has been found in any of the eukaryotic examples. The number of WD repeats per known protein ranges from 4 to 12, with the identification of some domains uncertain. A canonical WD repeat displays a strictly conserved organization and structure. It is composed of 44 to 60 amino acids forming a core region and of a more variable region (Neer et al. 1994; Neer and Smith 1996; Smith et al. 1999). The core region is in most cases delimited by the dipeptides Gly-His (GH) and Trp-Asp (WD) at the N and C termini, respectively. It can be described by a “regular expression,” which defines the most probable amino acid at each position within this core. The structure of one member of the family, the β subunit of the G protein, has been determined by crystal X-ray diffraction (Wall et al. 1995; Sondek et al. 1996). This protein contains an α-helical domain followed by seven WD repeats that fold into a β propeller composed of seven structural β blades. The β propeller is a highly symmetrical structure, with its units (the blades) arranged around a central axis of pseudosymmetry. Each blade is composed of a four-stranded antiparallel β sheet. There is a strict relationship between WD-sequence repeats and β blades, with each blade being composed of the first three β strands from one repeat and the fourth from the next WD repeat in the protein sequence. This arrangement allows formation of the propeller and ensures the rigidity and stability of the protein (Smith et al. 1999). The variable region is usually 11–24 residues long, although it may be up to 94 amino acids.
Several proteins not containing WD repeats show a similar β propeller structure, with a number of blades ranging from four (hemopexin [PDB #1hxn], porcine collagenase [PDB #1fbl], gelatinase A [PDB #1gen]) to six (neuramidase [PDB #6nn9], galactose oxidase [PDB #1gof]), seven (methylamine dehydrogenase [PDB #2bbk]), and eight blades (methanol dehydrogenase [PDB #3aah] and cytochrome cd1 nitrite reductase [PDB #1aof]).
Searches through the sequence databases indicate that WD-repeat proteins are not widely distributed among prokaryotes and often display nonstandard characteristics. The actinomycete Thermomonospora curvata contains one potential WD-repeat protein that displays a domain that is homologous with identified protein-Ser/Thr kinases (Janda et al. 1996). One putative WD-repeat protein (gi4982429) is found in the thermotogale T. maritima. Two cyanobacteria, Anabaena PCC7120 and Synechocystis PCC6803, possess several open reading frames (ORFs) encoding putative WD-repeat proteins. Among the five such potential proteins present in the latter strain, one (Sll0163 in Cyanobase [Kaneko et al. 1996]) would contain 16 WD repeats, the highest number known (Smith et al. 1999). The protein encoded from another locus from this strain, slr0143, would contain 11 WD repeats (Bédu et al. 1997). This latter protein is implicated in the regulation of the high-affinity uptake system for inorganic carbon (Ci) that is active when the cells are grown under limiting Ci (LC) conditions and thus was named Hat (high-affinity transport; Bédu et al. 1995). The existence of these WD-repeat proteins in bacteria raises questions regarding their frequency and distribution among prokaryotes, their function and evolution. The present work describes the possible organization and function of the Hat protein from Synechocystis PCC6803, as deduced from the analysis of deletion mutants.
Results and discussion
Evidence for a unique ORF in the hat locus
Based on partial sequencing of the hat locus and evidence of distinct phenotypes for mutants deleted for various regions, it was proposed originally that this locus was composed of at least two ORFs (Bédu et al. 1995). However, the completion of the genome sequence of Synechocystis PCC6803 (Kaneko et al. 1996) indicated an almost complete identity of the supposed hat ORFs with a single ORF, slr0143. Our own verifications (data not shown) pointed to two limited zones of error in the previous sequencing and clearly indicated, after correction, that indeed only one protein, 1191 amino acids long, was encoded. The name Hat was retained for this protein.
Definition of the various potential domains of the Hat protein
The complete protein (Fig. 1) displays two main domains separated by a hydrophobic linker sequence. The N-terminal domain, 453 amino acids long, is hydrophilic and does not show significant similarity with other proteins. The hydrophobic sequence (residues 454–473), with a hydrophobicity index of 2 (Kyte and Doolittle 1982), may be a transmembrane region. The C-terminal domain of the protein contains the potential WD repeats.
The seventh putative WD repeat overlaps a 22–amino acid hydrophobic region (residues 802–824; hydrophobicity index 1.3 after Kyte and Doolittle 1982) included in a structure similar to that of several signal peptides. It is particularly similar to that of plastocyanin, a protein translocated across the thylakoid membranes in cyanobacteria and higher plants (Mackle and Zilinskas 1994). A putative cleavage site is located at residue 823.
The variable region separating the seventh and eighth repeats, with its 159 residues, is much longer than even the maximal length encountered (94 residues) for these sequences in all other currently known WD-repeat proteins. Its presence supports an organization of the WD repeats into two groups of 7 and 4, respectively. As described below, such a hypothesis would fit with both functional and structural data.
The function of Hat: specificity of different domains of the protein in the response to a Ci-limited regime
Phenotype of Hat mutants carrying various deletions
Previous analyses of two Hat mutants had established a role of this protein in the activity of the high-affinity Ci transport system that is induced in cells adapted to a limiting supply of this substrate (Bédu et al. 1995). In mutant KBC, the C-terminal 125 amino acids of Hat were removed, a region that includes the last two putative WD repeats and was identified previously as the HatR peptide (Fig. 1; Bédu et al. 1995). The growth rate of KBC was reduced under all Ci regimes. In contrast with this limited alteration of the growth capacities, the high-affinity Ci transport activity induced in wild-type cells during growth under LC conditions was undetectable in the mutant. In mutant SNH, the gene hat carried an insertion that resulted in the deletion of most of the variable region between the seventh and eighth WD repeats along with the sequence farther down (Fig. 1). This mutant was no longer able to grow under LC conditions.
In an attempt to define further the putative roles of different domains of Hat, another mutant, KBBA, was constructed in which a deletion included the C-terminal part of Hat encompassing the transmembrane region and the entire WD-repeat–containing domain (Fig. 1; see Materials and Methods). The probable rapid degradation of the remaining N-terminal portion of Hat was indicated by the absence of any signal in the subcellular fractions of mutant cells when tested with appropriate polyclonal antibodies (data not shown). This mutant behaved like the wild type when grown under high Ci conditions but totally lost the ability to adapt to LC conditions (Fig. 2), as in the case of mutant SNH.
Probing a possible response regulator function of the HatR region
According to its linear sequence, the region previously referred to as HatR (see above and Fig. 1) includes three conserved amino acids characteristic of two-component regulatory proteins such as CheY from Escherichia coli (Stock et al. 1990). Among these, two are invariant: an aspartate as the site of the phosphorylation driving activation and a lysine required for activation (D57 and K109, respectively, in CheY). In Hat, the conserved residues would be D1083, D1127, and K1175. The Asp residue at position 1127 would be the site of phosphorylation. Because the association of such an activity with a WD-containing protein would be totally original, this possibility was studied further.
An analysis (via the GOR1, GOR4, DPM, Predator, HNN [ICBP, University of Lyon], and PHD and DSM prediction [BMERC, Boston University] programs) of the secondary structure of HatR did not show the structural characteristics (five repeats of a β sheet/α-helix structure) of receiver proteins or domains. Moreover, a consensus sequence of response regulators, obtained after an improved alignment of 163 sequences, could not be found in the HatR sequence (data not shown). This argued strongly against this region of Hat having a regulator function. However, because the overlap of WD repeats with a receiver domain would represent a unique finding, a functional analysis was performed.
The in vitro phosphorylation characteristics of the HatR peptide were studied as in Reyrat et al. (1993) and Lukat et al. (1992). HatR was overproduced and purified to 85% homogeneity. 32P-acetyl-phosphate was used as the substrate (see Materials and Methods). A phosphorylation was observed, but its characteristics were not typical of known response regulators (Fig. 3A). The reaction was not inhibited by EDTA (10 mM), indicating that the phosphorylation was not dependent on Mg++, as is the case for CheY and FixJ. The phosphate-protein bound was, however, sensitive to high and low pH (data not shown), a characteristic of phosphoryl-aspartate residues.
To assess the specificity of this phosphorylation reaction, we replaced residue D1127 with a glutamate (see Materials and Methods). This replacement should lead to a lower level of phosphorylation if this is specific for the aspartate residue (Bourret et al. 1993; Volz 1993). The phosphorylation characteristics of HatR-D1127E were identical to those of HatR with regard to acetyl-phosphate concentration, absence of inhibition by EDTA, and reaction time (Fig. 3B). The phosphorylation observed for HatR appeared to be nonspecific.
All available structural and biochemical criteria thus led to the rejection of the hypothesis of a functional response regulator located at the C-terminal extremity of Hat.
Cellular localization of Hat
The presence of a potential transmembrane segment located between the hydrophilic N-extremity and the WD-repeat–containing part of Hat (Fig. 1) strongly suggested a transmembrane organization, with the N- and C-extremities in different compartments separated by a membrane. The question was then raised of the nature of the membranes (plasma vs. thylakoid) in which the protein is localized. The membraneous and soluble fractions from wild type cells were separated and assayed by immunoblotting with antibodies raised against part of the N-terminal hydrophilic domain of Hat (Hat228–453) or the C-terminal HatR peptide. A weak signal was observed in the thylakoids with both antibodies (Fig. 4). As expected, no signal was found in the soluble fraction, and only a very faint signal could be detected in the plasma membrane fraction. The latter fraction is usually contaminated by thylakoids because of their larger proportion (between 50- and 100-fold) as compared with plasma membranes. The level of contamination was evaluated by the relative levels of chlorophyll a in the two fractions. In the assay shown in Figure 4, 5, Fig. 5.% of the total chlorophyll was detected in the plasma membrane fraction, a level that could account for the observed signal. It thus is most probable that Hat is localized in the thylakoid membranes. The intensity of the immunodetection signal suggested that the cell content of Hat is extremely low, although a poor antigenicity of the test peptides could not be excluded.
Predictions for a structural organization of Hat
Assessment of the WD repeats
Alignment of the C-terminal part of Hat with the WD-regular expression allowed the identification of 11 putative WD repeats (Fig. 5). The regions between the GH and WD dipeptides all have a consensus length of ∼40 residues, with the defined highly conserved amino acids (Neer and Smith 1996; Fig. 5A) present at frequencies of 54%–100%. The predicted secondary structures of the core were generally in agreement with that of the model. However, when compared with the second WD repeat of the yeast Gβ protein (using the BLAST program), the degrees of conservation of the 11 repeats appeared unequal. In particular, the 8th and 11th possible repeats were highly divergent. Independent searches for motifs (via the ProfileScan [ISREC, Lausanne University, Switzerland] program) or for secondary structure (via the GOR1, GOR4, DPM, Predator and HNN [ICBP, University of Lyon] programs) led to similar results. In the latter case, the organization into four β strands was not found at all in the 8th and 11th repeats. This called into question their identification as WD repeats and thus their role in the three-dimensional structure of the whole protein.
The lengths of the variable regions initiating the WD repeats correspond to that of the canonical model, with 10 or 11 residues, with the exception of that between the seventh and eighth WD repeats, which is much longer. An analysis of this region has not revealed homology with any reported protein.
Possible three-dimensional structures for the WD-containing moiety of Hat
The presence in Hat of the unusually long variable region suggested a separation of the first seven blades from the last four. The modeling of Hat thus has been performed according to two strategies corresponding to the two possible predictions, an organization into a 7-mer and a 4-mer β propeller, or into an 11-mer structure. Two crystallized β-propeller–forming proteins, the WD-containing Gβ protein (Wall et al. 1995; Sondek et al. 1996) and the non-WD-repeat protein hemopexin (Faber et al 1995) were used as models. The Gβ protein contains seven repeats forming a seven–β-blade propeller whereas hemopexin forms a propeller with four β blades.
The 7-mer model.
The modeling of the first seven WD repeats of Hat based on the fourth WD repeat of the Gβ1 protein (see Materials and Methods) was analyzed for intrablade hydrogen bounding and interblade hydrophobic interactions. They were found sufficient to ensure a good stability and compactness of the molecule. Such a structure thus appeared quite probable (Fig. 6A). Detailed analysis of the seventh blade in this propeller model indicated that the hydrophobic region ranging over residues 802–824 is likely embedded inside the structure.
The 4-mer model.
The existence of an independent 4-mer structure encompassing the last four potential WD repeats of Hat was questioned because of the small size (10 residues long) of the variable regions of these repeats. A minimum size of 14 residues is more typical for the closing of a 4-mer structure into a β-propeller ring (Smith et al. 1999). However, the possibility of folding this region of Hat into a 4-mer structure was considered via a similar strategy using the structural coordinates of hemopexin as a model. The strategy used was justified by theoretical studies (Neer and Smith 1996). Superposition of the α carbon atoms of the a, b, and c β strands of the WD repeats from protein Gβ with those from the β-helices of two non-WD-containing proteins (porcine collagenase and methylamine dehydrogenase) could be realized with deviations no larger than 0.9–2.2 Å.
A β blade structure could be obtained for each of the last four WD repeats of Hat (Fig. 6B). Only two of these, however, showed internal hydrogen bonding in agreement with the model. An organization into a β propeller might be considered via stabilization through an association with the 7-mer propeller, or with the long variable region separating this 4-mer region from the 7-mer region. Availability of the crystal structure of a 4-mer WD-repeat protein obviously would be helpful in challenging such hypothetical organizations.
The 11-mer model.
Considering that each of the 11 WD repeats could be organized into a β blade, the possibility of an 11-mer β propeller structure, although never encountered up to now, remained open. From an artificial template protein composed of 11 copies of the fourth WD repeat of Gβ (see Materials and Methods), a β-propeller structure with 11 blades corresponding to the 11 WD repeats of Hat was constructed (Fig. 6C). The compactness of the molecule obviously could be maintained by hydrogen bonding inside each blade, with the limits previously mentioned. However, even the high number of minimization steps applied (see Materials and Methods) did not allow efficient bonding between the blades. An estimation of the distance between the center of mass of the whole structure and that of each blade led to a value of 13 Å for the diameter of the central axis, as compared with 8.5 Å for an eight-blade β propeller. This might not be compatible with a good stability of the structure.
This model was based strictly on the WD repeats of the Hat protein. The 453–amino acid long N-terminal domain might not influence the structure of the WD-containing domain because the two parts of Hat lie on different sides of a membrane. In contrast, the 159–amino acid long variable region between the seventh and eighth WD repeats might participate in the stability of such a β propeller.
Attempts to validate these models
Another approach to establish the possible structure of Hat consisted in applying the SWISS MODEL program. According to this program, homologies are identified in the primary sequence of the analyzed protein, with proteins of known structure. Alignments of the homologous regions are used to realize a structural model based on the crystallographic data of the selected known proteins. Two very close models thus emerged, MSD1 and MSD2 (not shown), corresponding to residues I546 to D835 and Q544 to D835 respectively, that is, the region covering the first to seventh WD repeats. They were based on the crystal structure of the β subunits of three G proteins (PDB #1got, 1gg2, and 1tbg). Comparison of their atomic coordinates to those of control structures (Table 1) tested the validity of the 7-mer models obtained for Hat. The root-mean-square deviations (rmsd) values (Neer et al. 1996; see Materials and Methods) indicated that the three control structures were very similar (0.58 Å < rmsd > 0.70 Å). The two SWISS MODEL structures were closest (rmsd = 0.46 Å), but still quite close to the 7-mer model previously obtained (rmsd = 1.35 Å and 1.41 Å, respectively). The absolute values of these coefficients tend to validate all three models, because they are lower than 1.5, considered as the limit of variability allowed.
No 4-mer structure was proposed from this analysis, and obviously no 11-mer, because no crystal structure for such WD-repeat proteins exists in the three-dimensional Brookhaven Protein database.
Analysis of the various mutants highlighted the functional complexity of Hat, with different regions playing distinct roles. The localization of the protein in the thylakoids rules out a direct interaction with the high-affinity Ci transport system, obviously located in the plasma membrane. The importance of the whole WD-containing moiety of the protein in the control of Ci metabolism clearly has been established. The last four WD repeats, possibly in association with the insert linking the 7-mer to the 4-mer regions, are essential for growth under an LC regime (mutants SNH and KBBA). Within the C-terminal 4-mer region, the subregion encompassing the last two WD repeats is specifically required for the activity of the high-affinity transport system, itself nonessential for growth under LC conditions (mutant KBC). All available criteria present the seven-blade β-propeller structure encompassing the first seven WD repeats of Hat as very probable. Functional and structural data thus argue in favor of an organization of the WD moiety of Hat in two subdomains of seven and four WD repeats, respectively. As already mentioned, the C-terminal 4-mer region might interact with another, still unknown, protein or peptide. This interaction might be essential in modulating the stability of the 4-mer structure, and thus the accessibility of this subdomain, or at least of the region encompassing the last two WD repeats, specifically involved in the Ci transport activity.
Materials and methods
Cyanobacterial strains and growth conditions
Wild-type Synechocystis PCC6803 was obtained from the Pasteur Culture Collection. The wild type and the mutants were cultured at 34°C, under continuous illumination, in modified Allen's medium, as in Bédu et al. (1995). Two Ci regimes were used, designated high (12 mM sodium bicarbonate) and low (LC, CO2 provided by air). Growth was followed by turbidimetry (1 U OD750 = 107 cells/mL). For growth of mutant KBBA, 100 μg/mL kanamycin were added.
These were performed according to Sambrook et al. (1989) or to the suppliers' protocols. Homology comparisons were performed with BLAST.
Construction of mutant KBBA
A BamHI-EcoRI 2305-bp DNA fragment encompassing the 1786 bp of the 5′ end of hat was cloned into plasmid pUC19. A kanamycin resistance cassette from pUC4K (Boehringer Mannheim) was inserted into the BclI site at position 1361 (codon 454). The resulting plasmid served to transform Synechocystis with screening for kanamycin-resistant clones. Their homogeneity was verified by DNA hybridization and by polymerase chain reaction (PCR). One of these clones, KBBA, was used.
Site-directed (D1127E) mutagenesis of the HatR peptide
Site-directed mutagenesis has been realized in a two-step PCR. The first step consisted in the amplification of the 5′ extremity of the region encoding the HatR peptide with the D1127 codon (GAT) replaced by a glutamate codon (GAG) in the appropriate primer. The amplified product served as a primer for the amplification of the complete HatR encoding sequence. Polymerization reactions were run with the Expand High Fidelity polymerase (Boehringer Mannheim) according to the manufacturer. After verification by sequencing, the mutated product was cloned into pGEX4T3 (Pharmacia).
Cell fractionation of Synechocystis PCC6803
Cells of wild type or mutant Synechocystis PCC6803 were collected from 5 L cultures in standard medium at an OD750 of 4. Cell fractionation was performed according to Omata and Murata (1984), with a gradient containing the following volumes of sucrose solutions: 90% (3 mL), 55% (5 mL), 48% (7.5 mL, containing the sample), 30% (6 mL), and 10% (8.5 mL). Contamination of the plasma membrane fraction by thylakoids was estimated by the relative chlorophyll a contents (absorbance at 665 nm).
Production and purification of peptides and immunotechnology
The DNA fragments encoding the N-terminal (residues 228–453) and C-terminal (residues 1067–1191, HatR) regions of Hat were amplified by PCR by using primers flanked by appropriate restriction sites. The amplified fragments were cloned into pET15b (Novagen) and pGEX4T3, respectively. Synthesis of the peptides was performed in strain E. coli BL21(DE3). Purifications were run via a His-affinity column for the pET15b-produced peptide and via a glutathione-Sepharose 4B affinity column for the HatR peptide expressed from pGEX4T3.
The N-terminal and HatR peptides, after further purification by SDS-PAGE, were used to raise polyclonal antisera. Western blot analysis was performed with the ECL system (Amersham; Bédu et al. 1997). The antibodies were diluted to 1 : 100. Anti-rabbit antibodies coupled to horseradish peroxidase (Promega) were used for detection.
In vitro phosphorylation of peptide HatR
In vitro phosphorylation of the purified HatR peptide was performed according to Reyrat et al. (1993) by using 32P-acetyl-phosphate as phospho-donor. Electrophoresis was realized in a 20% acrylamide SDS-PAGE. 32P-acetyl-phosphate was synthesized as in Stadtman (1957).
The strategy used for the construction of the different structures was as follows. The fourth WD repeat of the Gβ1 protein (Wall et al. 1995) was identified as the most similar in sequence and length to those of Hat. The crystal coordinates of this repeat were sequentially replicated to produce a 7-mer, a 4-mer, and an 11-mer propeller model (using the QUANTA version 4.1 program). The energy of the resulting structures was minimized to remove side chain steric clashes (100 steps for the 7-mer and 4-mer, 1000 for the 11-mer; CHARMm version 23.1; Brooks et al. 1983). The Gβ1 side chains in these models were replaced by the corresponding residues of Hat, and the structure solvated with an 8-Å water shell. The structures then were reminimized (1000 steps for the 7-mer and 4-mer, 6000 for the 11-mer) first by constraining the backbone of the β strands, and subsequently without any constrains (Adopted Based Newton-Raphson algorithm as implanted by CHARMm). Finally the 7-mer and 4-mer models were subjected to molecular dynamics simulation (1000 steps of heating from 0 to 500 K, with a time step of 0.001 pS, equilibration, and cooling).
Model proteins were also directly constructed for the Hat WD repeats by using Gβ1 for the 7-mer and hemopexin for the 4-mer. These were constructed via the SWISS MODEL program (http://expasy.hcuge.ch/swissmod/SWISS-MODEL.html). Calculation of the rmsd differences between the respective models and their initial templates was performed using Swiss-PDB viewer v 3.1 program (Glaxo Wellcome Experimental Research).
Table Table 1.. Comparison of the atomic coordinates (rmsd, in Å) of (A) the control structures of three G proteins (PDB #1got, 1gg2 and 1tbg) and (B) the three models, MSD1, MSD2, and the 7-mer, of Hat
We thank A. Janicki for her excellent technical assistance. This work was supported by grants from the CNRS (UPR 9043), the Université de la Méditerranée (France), and NHGRI Grant T32 HG00041 to the BREMC Laboratory.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.