SEARCH

SEARCH BY CITATION

Keywords:

  • Protein design;
  • backbone design;
  • core sidechain packing;
  • dead-end elimination;
  • ORBIT

Abstract

  1. Top of page
  2. Abstract
  3. Results and discussion
  4. Materials and methods
  5. Electronic supplemental material
  6. Acknowledgements
  7. References
  8. Supporting Information

The solution structures of two computationally designed core variants of the β1 domain of streptococcal protein G (Gβ1) were solved by 1H NMR methods to assess the robustness of amino acid sequence selection by the ORBIT protein design package under changes in protein backbone specification. One variant has mutations at three of 10 core positions and corresponds to minimal perturbations of the native Gβ1 backbone. The other, with mutations at six of 10 positions, was calculated for a backbone in which the separation between Gβ1's α-helix and β-sheet was increased by 15% relative to native Gβ1. Exchange broadening of some resonances and the complete absence of others in spectra of the sixfold mutant bespeak conformational heterogeneity in this protein. The NMR data were sufficiently abundant, however, to generate structures of similar, moderately high quality for both variants. Both proteins adopt backbone structures similar to their target folds. Moreover, the sequence selection algorithm successfully predicted all core κ1 angles in both variants, five of six κ2 angles in the threefold mutant and four of seven κ2 angles in the sixfold mutant. We conclude that ORBIT calculates sequences that fold specifically to a geometry close to the template, even when the template is moderately perturbed relative to a naturally occurring structure. There are apparently limits to the size of acceptable perturbations: In this study, the larger perturbation led to undesired dynamic behavior.

It is now well known that protein backbones undergo small but global rearrangements to accommodate changes in hydrophobic core packing when core amino acid residues are mutated (Baldwin et al. 1993; Lim et al. 1994). Understanding this interplay between sequence and structure is particularly important for protein design. Most computational design methods presented to date presuppose a rigid backbone structure (for review, see Street and Mayo 1999), though several groups have reported efforts to treat both backbone structural variability and side-chain selection (Su and Mayo 1997; Harbury et al. 1998; Desjarlais and Handel 1999). In our approach, the global fold of a protein is decomposed via supersecondary structure parameterization. Variation of supersecondary structure parameter values then provides new fixed-backbone templates for input to a sequence selection algorithm.

In particular, we studied the immunoglobulin binding β1 domain of streptococcal protein G (Gβ1), a 56-residue domain comprising a four-stranded β-sheet and an α-helix. Four parameters were derived that fix the position and orientation of the helix with respect to the sheet: the distance between the helix center and the sheet plane, two angles defining the orientation of the helix axis with respect to the sheet plane, and an angle defining rotation about the helix axis. Each of these parameters was varied incrementally (up to ±1.5 Å for the helix-sheet distance and up to ±10° for the angles) to generate novel backbones. The backbones were then used as templates for core residue sequence selection calculations with the ORBIT (Optimization of Rotamers By Iterative Techniques) protein design programs, which utilize the dead-end elimination theorem to solve the rotamer space combinatorial optimization problem (Desmet et al. 1992; Pierce et al. 2000). The 10 most buried residues in the crystal structure of the wild-type protein (excluding glycines) were included in the calculation: backbone variation and subsequent sequence selection resulted in mutations at three to six of these positions (Su and Mayo 1997).

Gβ1 variants containing the optimal sequences calculated in this fashion were expressed and purified for analysis. Thermal stabilities were assessed by circular dichroism (CD) spectroscopy; fold specificities were evaluated by a qualitative consideration of chemical shift dispersion in 1D 1H nuclear magnetic resonance (NMR) spectra. It was found that small perturbations of the backbone yielded small changes in core sequence (three of 10 positions) and that the proteins containing those sequences were similar to Gβ1 in thermal stability and chemical shift dispersion. Many of the sequences calculated for more extensively displaced backbones also yielded well-folded proteins, judged by chemical shift dispersion. Several of these latter variants, however, are destabilized relative to the wild-type protein.

Analysis at this level establishes that the sequence selection algorithm is tolerant of small variations in backbone specification: when a nonnative but native-like backbone is used as a template, a sequence is calculated that yields a well-folded, thermostable protein. It is of considerable interest to know, further, how closely the folded protein matches the target structure and, particularly, how accurately the algorithm predicts core side-chain packing under backbone perturbations.

We report here the solution structures determined by 1H NMR, of two Gβ1 variants: one minimally perturbed (a threefold mutant) and one extensively perturbed (a sixfold mutant). When the native Gβ1 backbone is used as a template, the lowest-energy calculated sequence has three conservative mutations relative to the wild-type sequence: Y3F, L7I, and V39I (Dahiyat and Mayo 1997). These mutations have been rationalized in terms of the details of the calculation (Su and Mayo 1997). Experimentally, the protein containing this sequence (designated Δh0.9[+0.00 Å] in the previous study, referred to hereafter as Δ0) was found to be slightly more stable than wild type, with a melting temperature (Tm) of 91°C (Tm of Gβ1 is 89°C). The Δ0 sequence was also obtained by sequence selection with several different backbones in which the orientation of the helix with respect to the sheet was varied by small amounts. Thus Δ0 represents the optimal sequence for backbones close to the native fold. Displacement of the template helix from the sheet plane by +1.50 Å yields the sixfold mutant, which contains the three core substitutions of Δ0 plus F30L, A34I, and F52W. Among the extensively perturbed variants of the earlier study, this protein (previously designated Δh1.0[+1.50 Å], referred to hereafter as Δ1.5) was the best behaved, with chemical shift dispersion comparable to wild type and a Tm of 73°C.

Results and discussion

  1. Top of page
  2. Abstract
  3. Results and discussion
  4. Materials and methods
  5. Electronic supplemental material
  6. Acknowledgements
  7. References
  8. Supporting Information

Standard sets of 2D 1H NMR data were collected for Δ0 and Δ1.5. Spin systems were assigned for all residues of Δ0. Core residue side chains were completely assigned; other side-chain assignments are >95% complete. Good dispersion of chemical shifts and narrow linewidths in the Δ0 spectra indicate that this protein favors a single conformation under the experimental conditions. The Δ1.5 data, by contrast, contain evidence of conformational dynamics. While resonance assignments for this protein are also ∼95% complete, no spin system was found for E27, and cross peaks to the backbone amide protons of T25, T51, and T53 are broadened and of low intensity. The chemical shifts of the ring protons of W52 are similar to random coil values, and the indole imino proton signal from this residue is absent, suggesting that its side chain is conformationally labile and accessible to solvent. Also, the Hε and Hζ ring protons of F3 could not be assigned definitively.

Families of structures consistent with the data were generated by standard distance geometry/simulated annealing methods (Nilges et al. 1988, 1991). The structures of both molecules are well defined, and their stereochemical quality is good (Table 1). Both proteins have the characteristic protein G fold. The Δ0 sequence adopts a fold quite similar to its template, that is, the native Gβ1 backbone (Fig. 1a). The RMS deviation (RMSD) between atoms in the minimized mean experimental backbone and atoms in the crystallographic backbone is 0.92 Å (excluding two residues at the N terminus, for which few experimental restraints exist). Δ1.5 also closely matches the native Gβ1 structure, with a backbone atomic RMSD of 1.03 Å. With a backbone atomic RMSD of 1.26 Å (Fig. 1b), Δ1.5 is somewhat less similar to its own target backbone.

Prediction by ORBIT of core side-chain packing was found to be excellent (Fig. 2a,b). All of the nontrivial core residue κ1 angles were predicted correctly: the largest deviations between target and experimental structures were 22° (F30) in Δ0 and 35° (L5) in Δ1.5. Somewhat less robust was the κ2 angle prediction: five of six nontrivial κ2's were correctly predicted in Δ0, four of seven in Δ1.5. Closer examination of the Δ1.5 core reveals that the residues for which κ2 is mispredicted (F3, L5, L30) interact with side-chains that are dynamically disordered (E27 and W52, as described above). Misprediction of κ2 in these residues might be a further indication of conformational heterogeneity in this portion of the protein.

A previous study found that Gβ1 variants with multiple core mutations form stable well-folded proteins (Gronenborn et al. 1996). We have extended this result herein, showing that a native-like fold is retained with changes at as many as six of 10 core positions. The Δ0 and Δ1.5 structures demonstrate, furthermore, that the sequences generated by ORBIT from perturbed backbone templates lead to correctly folded proteins and that ORBIT predicts core side-chain conformations in such proteins reasonably well. Similar success in predicting fold specificity and core packing has been demonstrated for the ROC algorithm in a study of a designed core variant of ubiquitin (Johnson et al. 1999). In that study, a detailed analysis of backbone and core sidechain dynamics showed small but significant differences between wild-type and variant proteins. Our sixfold mutant Δ1.5, the sequence obtained from the largest backbone perturbation we attempted, also shows unintended dynamic behavior. Much of this behavior may be caused by two aspects of the F52W mutation. First, the experimental Δ1.5 backbone more closely resembles the wild-type than the calculated backbone, so the core is overpacked. The bulk of the W52 side-chain must be compensated in ways (such as local structural fluctuations) other than global displacement of the helix from the sheet plane. Second, burial of the W52 imino proton in the hydrophobic core without a hydrogen-bonding partner may also contribute to the conformational exchange.

These results suggest several avenues for improvement of the design protocol. The method used to generate the Δ1.5 template neglected the loops connecting helix and sheet. Experimentally, we found that the Δ1.5 sequence does not achieve the helix–sheet separation specified in the Δ1.5 template; explicit consideration of loop length during backbone specification might enable us to achieve better agreement between target and experimental structures. In addition, further terms in the ORBIT scoring function, such as a penalty for burial of uncompensated polar hydrogens (implemented subsequent to this study), may lead to more favorable sequence selection and, hence, improved fold specificity.

Materials and methods

  1. Top of page
  2. Abstract
  3. Results and discussion
  4. Materials and methods
  5. Electronic supplemental material
  6. Acknowledgements
  7. References
  8. Supporting Information

Designed proteins were expressed and purified as previously described (Su and Mayo 1997). For NMR experiments, 5–15 mg of lyophilized protein was dissolved in 700 μL buffer (50 mM sodium phosphate in either 90% H2O/10% D2O at pH 6.0 or 99.9% D2O, pD 6.0), yielding 1–3 mM protein concentration. NMR experiments were performed on a Varian UnityPlus 600-MHz spectrometer equipped with a Nalorac Z-axis gradient probe. DQF-COSY, TOCSY, and NOESY spectra were acquired at 25°C for the structure determinations. Additional data sets were acquired at 35°C to facilitate resonance assignments. TOCSY spectra were acquired with mixing times of 25 and 80 msec, NOESY spectra with mixing times of 75, 100, and 150 msec. The spectral width in all experiments was 7500 Hz. The TOCSY and NOESY spectra were recorded with 256t1 * 1024t2 complex points, the DQF-COSY spectra with 512t1 * 2048t2 complex points. Amide hydrogen exchange rates were measured by following the time course of the disappearance of amide-α proton crosspeaks in magnitude-mode COSY spectra (256t1 * 2048t2 points) for protiated, lyophilized protein resuspended in 99.9% D2O. E.COSY spectra were also acquired, with 625t1 * 2048t2 complex points. All spectra were processed with VNMR (Varian).

Resonance assignment was performed using ANSIG (Kraulis 1989) for the Δ0 data and NMRCOMPASS (MSI) for the Δ1.5 data. Cross peaks in the 75 msec mixing time NOESY spectra were assigned for use as distance restraints. Poorer dispersion in the Δ1.5 spectra than in the Δ0 spectra necessitated additional steps in assigning NOESY cross peaks, as follows. A table of putative NOESY cross-peak assignments was generated automatically in NMRCOMPASS. Proton pairs separated by >10 Å in the Δ1.5 template were discarded as possible assignments, yielding a partially assigned restraint set (Nilges et al. 1997). The subset of unambiguously assigned restraints taken from this set was used to calculate an initial ensemble of structures. The minimized mean of this ensemble was then used to calculate a new set of interproton distances, which were again used to filter the NOESY crosspeak assignments, this time with a 5-Å distance cutoff. After the second cycle of distance filtering, remaining ambiguous restraints were discarded. This approach resulted in a comparable number of distance restraints for the two proteins (Table 1). The κ1 restraints were obtained from coupling constant measurements in E.COSY spectra combined with patterns of intraresidue NOEs (Wagner et al. 1987). These angular restraints were found to improve the quality and precision of the ensemble of Δ1.5 structures but not that of the Δ0 structures. Hence, κ1 restraints were not used in refinement of the Δ0 ensemble. Handling of experimental restraints was otherwise as previously described (Malakauskas and Mayo 1998).

Standard hybrid distance geometry/simulated annealing protocols were used to find structures consistent with experimental restraints (Nilges et al. 1988, 1991). Distance geometry structures (100) were generated, regularized, and refined, resulting in ensembles of structures (68 for Δ0, 81 for Δ1.5) with no restraint violations >0.3 Å, RMSDs from idealized bond lengths <0.01Å, and RMSDs from idealized bond angles <1°. Statistics for the 40 lowest-energy structures of each of these ensembles are compiled in Table 1.

Electronic supplemental material

  1. Top of page
  2. Abstract
  3. Results and discussion
  4. Materials and methods
  5. Electronic supplemental material
  6. Acknowledgements
  7. References
  8. Supporting Information

1H resonance assignments are provided for both proteins. Table S1 contains chemical shifts for Δ0. Table S2 contains chemical shifts for Δ1.5.

Table Table 1.. Experimental restraints and structure statistics
 Δ0Δ1.5
  • a

    a Ensemble RMSDs were calculated for residues 2–56 of both proteins.

  • b

    b Ramachandran analysis was performed with PROCHECK-NMR (Laskowski et al. 1996).

NOE distance restraints  
Intraresidue208317
Sequential145146
Medium range (2≤|i-j|≤4)6773
Long range (|i-j|≥5)176161
Hydrogen bond restraints2836
κ1 restraints010
RMSDs from data  
Distance restraints (Å)0.028 ± 0.0010.029 ± 0.003
κ1 restraints (°)n/a0.57 ± 0.50
RMSDs from ideal geometry  
Bonds (Å)0.0031 ± 0.00010.0033 ± 0.0001
Angles (°)0.55 ± 0.010.58 ± 0.01
Impropers (°)0.41 ± 0.010.42 ± 0.01
Ensemble atomic RMSDs (Å)a  
Backbone0.230.23
Heavy atoms0.740.60
Ensemble Ramachandran statisticsb  
Residues in most favored regions (%)77.780.4
Residues in additionally allowed regions (%)20.719.3
Residues in generously allowed regions (%)1.40.2
Residues in disallowed regions (%)0.10.1
thumbnail image

Figure Fig. 1.. Stereoviews of experimental versus target structures of Gβ1 variants. (a) Superposition of the minimized mean experimental structure of Δ0 (green) and the crystal structure of Gβ1 (red), accession code 1pga (Gallagher et al. 1994). (b) Superposition of the minimized mean experimental (yellow) and calculated (blue) structures of Δ1.5. Incomplete N-terminal methionine processing results in mixtures of 56 and 57 amino acid proteins, with the 57-mer predominating for more stable variants. The structures presented are the 57-mer of Δ0 and the 56-mer of Δ1.5 (sequence numbering for the 56-mer is used throughout the text). Figures were generated using MOLMOL (Koradi et al. 1996).

Download figure to PowerPoint

thumbnail image

Figure Fig. 2.. Sidechain packing in Gβ1 variants. (a) Core residue heavy atoms of the minimized mean experimental (green) and calculated (red) structures of Δ0. (b) Core residue heavy atoms of the minimized mean experimental (yellow) and calculated (blue) structures of Δ1.5. κ1 and κ2 angles in the ensemble of NMR structures were found in all cases to be well represented by the values in the minimized mean structures. Residue numbers are located near each residue's Cα atom.

Download figure to PowerPoint

Acknowledgements

  1. Top of page
  2. Abstract
  3. Results and discussion
  4. Materials and methods
  5. Electronic supplemental material
  6. Acknowledgements
  7. References
  8. Supporting Information

We thank Monica Breckow for assistance with molecular biology protocols. This work was supported by the Howard Hughes Medical Institute (S.L.M.). C.A.S. is partially supported by an NSF graduate research fellowship. Coordinates and NMR restraints have been deposited in the Protein Data Bank. Accession numbers for the coordinates are 1fd6 and 1fcl for Δ0 and Δ1.5, respectively.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

References

  1. Top of page
  2. Abstract
  3. Results and discussion
  4. Materials and methods
  5. Electronic supplemental material
  6. Acknowledgements
  7. References
  8. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Results and discussion
  4. Materials and methods
  5. Electronic supplemental material
  6. Acknowledgements
  7. References
  8. Supporting Information
FilenameFormatSizeDescription
Ross_ProSci_10_2.pdf42KSupplementary Materials (ESM) for Ross et al. (2000) Designed protein G core variants fold to native-like structures: Sequence selection by ORBIT tolerates variation in backbone specification.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.