Methods for rapid and reliable design and structure prediction of linker loops would facilitate a variety of protein engineering applications. Circular permutation, in which the existing termini of a protein are linked by the polypeptide chain and new termini are created, is one such application that has been employed for decreasing proteolytic susceptibility and other functional purposes. The length and sequence of the linker can impact the expression level, solubility, structure and function of the permuted variants. Hence it is desirable to achieve atomic-level accuracy in linker design. Here, we describe the use of RosettaRemodel for design and structure prediction of circular permutation linkers on a model protein. A crystal structure of one of the permuted variants confirmed the accuracy of the computational prediction, where the all-atom rmsd of the linker region was 0.89 Å between the model and the crystal structure. This result suggests that RosettaRemodel may be generally useful for the design and structure prediction of protein loop regions for circular permutations or other structure-function manipulations.
Computational protein design is an increasingly efficient and powerful tool to manipulate protein structure and function.1–7 However, most computational design has focused on optimizing amino acid sequences on static backbone structures. One of the essential challenges of more aggressive protein remodeling is the de novo design and structure prediction of individual protein segments within a rigid protein. RosettaRemodel is a generalized method for protein design and structure prediction in which backbone conformational freedom and sequence variation can be restricted to particular protein segments.8 Here we have studied a case of circular permutation as one example of a common structural manipulation requiring design of a single protein segment.9 The starting molecule to be circularly permuted was an epitope-scaffold onto which the 4E10 HIV neutralization epitope had been transplanted, as previously described by Correia et al.10 The epitope-scaffold, which had high affinity towards the 4E10 antibody (Fig. 1), was used as a model system to assess the linker design and structure prediction capabilities of RosettaRemodel.
Using RosettaRemodel we computationally modeled linker loops of different lengths to evaluate a variety of solutions to join the original termini. Most of the experimentally tested variants yielded soluble proteins and a crystal structure solved for one of the permuted designs showed very close agreement with the low-energy models generated by Rosetta.
Computational prediction of the linker used in the circular permutaton
To accomplish the circular permutation and relocate the termini to a region distal from the transplanted epitope, a loop modeling protocol was used to build several linkers to join the original termini, and new termini were created in a loop located on the opposite side of the protein relative to the epitope (Fig. 1). Because of the spatial proximity of the original termini (Cα-Cα distance = 8.7 Å), we started by modeling linkers with lengths ranging from five to seven residues; the linker sequences were composed of different combinations of alanine and glycine (Table I). RosettaRemodel was used to perform the structure prediction calculations. The first stage was carried out at low resolution in “centroid mode”, with side-chains represented by spheres located at each side-chain center-of-mass, and the new backbone conformations were built based on a fragment insertion protocol together with cyclic coordinate descent (CCD) to maintain proper chain connectivity.11, 12 The second stage of structure prediction was carried out with all-atom detail, and both backbone and side-chains were refined and minimized for accurate energy evaluation. For each of the different designed linkers, 2500 models were generated in which sampling of conformational degrees of freedom was restricted to the linker region. The 2500 models were clustered according to the Cα root mean square deviation (rmsd) in the loop region, and the three largest clusters included 303, 36, and 15 models, respectively [Fig. 2(A)]. The lowest energy model in the largest cluster was a logical selection as the top-ranked RosettaRemodel prediction for the loop conformation. The CPU (Intel 2GHz quad-core) time needed for RosettaRemodel to generate 2500 models was ∼ 2500 min.
Table I. Sequence Features and Experimental Characterization of the Designed Permuted Variants
The solution oligomeric state was characterized by SEC/SLS. The melting temperatures were determined by circular dichroism spectroscopy.
The sequences of the native protein and the permuted variants were tested for their similarity to known proteins by performing a Blast13 search against the nonredundant protein sequence database. While multiple full length matches were found for the native protein, none were found for the permuted variants. Instead, searches with the permuted variants recovered only discontinuous matches in which the N-terminus of the permuted variant matched to the C-terminus of the hits and the C-terminus of the permuted variant matched to the N-terminus of the hits (Fig. S1). Hence, circular permutation in this case created novel proteins.
Stability and solution behavior
To assess the solution behavior and thermal stabilities, all 6 permuted constructs were expressed in E. coli and experimentally characterized. Five of the six designs were purifiable and soluble. The solution oligomeric state was assessed by static light scattering (SLS) in-line with size exclusion chromatography (SEC). Four of five designs formed dimers in solution like the parent molecule, while one design formed a higher order multimer. The thermal stability of the designs was assessed using circular dichroism temperature melt analysis. Three of the designs had Tms ranging from 48°C to 51°C (Table I) where two other variants showed no transition. The permuted variants were prone to aggregation, as many 4E10 scaffolds have been,10 and this prevented quantitative assessment of binding affinities for the 4E10 antibody.
Structural characterization and modeling accuracy
To evaluate the accuracy of the computational modeling, crystal structures of the designs were pursued. Crystallization trials were conducted for all purifiable designs. One design (006) formed diffraction-quality crystals and a structure was determined (Table II). The overall fold of the parent protein was maintained in the permuted variant, with a backbone (N, Cα, C, O) rmsd of 0.4 Å between permuted variant and nonpermuted parent [Fig. 2(B)]. Upon the circular permutation, some of the residues included in the original termini underwent subtle conformational rearrangements [Fig. 2(C)]. The backbone and all-atom rmsd values in the designed loop region between the crystal structure of 006 and the lowest energy model in the largest cluster were 0.5 Å and 0.89 Å, respectively [Fig. 2(D)].
Table II. Crystallographic Statistics
Statistics for the highest resolution shell are shown in parentheses.
Circular permutation has been used for multiple purposes that span the optimization of solution behavior14 and function.15–17 Here we report a fast and accurate computational method that allows for the modeling of linkers to join the pre-existing termini, enabling the generation of the circular permuted variants in a controlled and rational fashion. The computational model and the solved crystal structure were in close agreement in terms of backbone and side-chain conformations. The computational model was selected based on cluster size and Rosetta full-atom energy, so the accuracy of the model supports the validity of both the conformational sampling and the energy function implemented in Rosetta.
Several computationally designed loops have been previously reported. Hu et al.18 accomplished the design of a 10 residue loop for which the conformation was predicted with subangstrom accuracy. In that work, several iterations of sequence-design and structural optimization were utilized to obtain the final sequence and structure. Correia et al.19 designed a 16 residue helix-loop segment that contributed to a protein core, also with subangstrom accuracy. That work followed a similar methodology, but unlike RosettaRemodel the conformational sampling and sequence design stages were not automated within a self-contained protocol. Here, for the design of a shorter five residue linker, the RosettaRemodel protocol achieved similarly accurate structure prediction with less sampling (2500 models). Hence, RosettaRemodel holds promise for more complex protein engineering tasks.
The RosettaRemodel protocol implemented in the software package Rosetta20 was used to sample low energy loop conformations with different predefined sequences (Table I). In the starting structure used in the computational protocol (PDB accession code 1xiz), new termini were imposed by removing two residues (K16 and E17 numbered as in 1xiz.pdb) and the original termini were joined by computationally modeled linkers. In the computational simulations, 2500 models were generated for each linker and only the side-chains included on the linker were allowed to sample different conformations, while the coordinates of the remainder of the structure were kept fixed. The conformational space for the newly designed linkers was sampled based on fragment insertion in conjunction with a Cyclic Coordinate Descendent step to guarantee proper polypeptide chain connectivity.11, 12 The fragments used were collected from available crystal structures21 and selected according to secondary structure defined for the newly designed linkers.19 Initially, conformational sampling was carried with a low-resolution description of the side-chains, which were represented as centroids, and in the final stage the sampled conformations were refined and scored using a full atom description of the protein.8 The generated models were clustered according to the rmsd of the designed loop using a hierarchical cluster algorithm as implemented in Rosetta, with a cluster radius of 0.1 Å. The lowest full atom energy21 structure from the largest cluster was selected to establish structural comparisons with the solved crystal structure.
The following command line was used to run the RosettaRemodel protocol as implemented in the software package Rosetta version 2.0: rosetta.intel -pose1 -remodel -s input.pdb -blueprint input.blueprint -try 50 -save_top 50 -num_frag_moves 10 -use_non_monotone_line_search –paths paths.txt. The blueprint file is composed of lines with residue specific instructions. The following examples illustrate the lines necessary for the designs in this paper: “1 V .” - residue 1 with the native residue valine will be untouched; “137 G L PIKAA A” – residue 137 with native residue glycine where fragments with loop conformation (L) will be inserted and the sequence change allowed is to alanine (PIKAA A). The use of this simplified syntax enables the manipulation of sequence and structure using RosettaRemodel. The computational models of the variants described in Table I are available in the electronic Supporting Information.
Expression and purification
Protein expression, purification, and thermal stability measurements were carried out as described in Correia et al.10, 19
Crystallization and crystallography
Crystals of T298 (12 mg/mL) were grown by vapor diffusion (well solution: 10% w/w PEG 4000, 20% v/v isopropanol) and cryo-protected with ethylene glycol. Diffraction data to 1.95 Å were collected at −170°C on a Saturn CCD detector with HF optic (Rigaku) and processed with d*TREK.22 Initial structure factor phases were determined by molecular replacement, using the program Phaser23 as implemented in the CCP4i graphical user interface,24 and a search model consisting of the partially refined model of a related epitope-scaffold with the epitope and several key residues removed. Successive rounds of modeling and positional and individual B factor refinement were carried out with the programs Coot25 and Refmac5.26 Structure validation was carried out with Procheck,27 the MolProbity server,28 and the RCSB ADIT validation server. The structure has been deposited in the RCSB PDB29 with PDB ID 3T43. Data collection and structure refinement statistics are shown in Table II.
The authors thank Colin Corrent for assistance with the structure deposition.