Prediction of Protein Structure Using Surface Accessibility Data

Abstract An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance‐to‐surface information encoded in the sPRE data in the chemical shift‐based CS‐Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach.

use the published chemical shift data (the corresponding BMRB codes are listed in supplementary table  1). Implementing a sPRE scoring function in the Rosetta framework, requires the back-calculation of sPRE data for a given protein model and the comparison of this back-calculated data with experimental data. The sPRE module is implemented as a WholeStructureEnergy and communicates with Rosetta using the common scoring interface of the framework. The module integrates seamlessly into existing Rosetta protocols and can easily be activated in every Rosetta application by assigning a non-zero weight to the identifier "spre" in the score function weighting set. Once the sPRE module is activated, an input file containing the sPRE data in talos file format (NMRPipe Table Format) has to be supplied. For details regarding code availability, setup instructions as well as a tutorial, please refer to http://mbbc.medunigraz.at/en/research/forschungseinheiten-und-gruppen/research-group-prof-madl/.

Code availability
In this work, we tested the sPRE module in conjunction with the AbinitioRelax and score_jd2 protocols of Rosetta 3.2 for structure calculation and rescoring structural ensembles, respectively. The sPRE module is entirely included in the Rosetta framework, requires no additional software or online service and will be made available in the upcoming releases of the Rosetta framework.

Input data
Although the module was mainly tested with proton sPRE data as input, the module supports input data for all carbon and proton atoms. The input data is adjusted as follows. For methyl groups, the sPRE values of all protons and the corresponding center carbon atom (if sPRE data is available) are averaged and assigned to the hetero atom. Similar re-mapping is done for tyrosine and phenylalanine sidechains in which case the protons and carbon atoms at the meta positions are averaged with the para carbon atom and mapped onto the para carbon atoms. For data sets that were not assigned stereo-specifically (non-stereo specific assignment is assumed by default), the sPRE values of prochiral protons are averaged with the corresponding carbon atom and projected onto the center atom (i.e. for serine, data for both H β protons and for the C β carbon atom is averaged and assigned to the C β atom). Missing data for carbons or protons does not affect the averaging. As the paramagnetic effect shows a γ -2 dependency, where γ is the gyromagnetic ratio of the corresponding nucleus, all sPRE values are normalized by γ 2 to allow proper averaging of sPRE data of different types of nuclei. To score centroid models, further mapping needs to be performed since centroid models contain only 7 atoms per residue (H, N, C α , C, O, C β and a sidechain pseudo atom CEN). Consequently, the data for H α is averaged with data for C α (if available) and assigned to C α . In a similar manner, data for H β is assigned to C β and all other sidechain data is merged and assigned to the CEN atom.

Back-calculation of sPRE data
sPRE data for a given structural model is back-calculated by an optimized grid-based approach (compare figure 1b) similar as described in previous studies. [1] In a first step, a uniform grid with a typical spacing of 0.5-2 Å between grid points (default 2 Å) is created around the given structural model. By default, every atom of the protein is at least 10 Å away from the boundaries of the grid. Next, the atom positions of the protein are discretized onto this grid by replacing the x, y and z coordinates of the atom with the coordinates of the closest grid point. In a third step, grid positions that fall within the van-der-Waals region of the protein are marked as occupied. The atom radius is obtained from the Rosetta database. For centroid pseudo atoms, the radius is approximated by the distance between the centroid atom and the C α atom of the same amino acid. All atom radii are increased by the radius of the paramagnetic agent (default 3.5 Å [1b, c] ). This effectively marks all positions of the grid that are not accessible by a paramagnetic agent. Next, the sPRE value for every atom is approximated by the sum of all grid positions within an integration radius (default 10 Å) that have not been marked as occupied in the previous step: where is the index of the protein atom, is the index of the grid point, is the number of grid points, sPRE model is the approximated sPRE value for the i th atom of the given protein structure, , is the discretized distance on the grid between the i th atom and the j th grid position, int is the integration radius (default 10 Å), and 0 if -th grid point marked as occupied 1 else . Since the protein atoms and the grid positions are discretized on the same grid, , is computed beforehand and stored in a lookup table.

Scoring a structural model
After back-calculating the sPRE data for every atom of the protein, the calculated values sPRE model are compared to the experimental data sPRE exp . By default, the sPRE module uses a robust spearman correlation coefficient to compute a scalar score. To calculate the spearman correlation coefficient, both data sets (sPRE model and sPRE exp ) are ranked independently, generating two new data sets r model ∈ 1, n and r exp ∈ 1, n . The raw score is then obtained using spearman  (see supplementary table 6) the raw score of most final full-atom structures ranges from 0 to 1. The spearman correlation was chosen as the default method from a set of several alternative methods. In this study, several other scores were tested and all of these scores were computed according to equation (3) score • ̃s core - where ̃s core denotes the raw score, score is the scaled score, and as well as are constants. score was then normalized with an appropriate power of the average reference sPRE. For every type of score, the values for and as well as the computation of ̃s core are listed in the supplementary table 7.
The scores based on the Pearson correlation ( pearson ), Spearman correlation ( spearman ) and the quadrant count ratio ( quadrant ) are derived from the corresponding correlation coefficients and as a consequence are mathematically bounded. All other scores in the supplementary table 7 are unbounded and normalization of these scores becomes more challenging. Here, we used a test of fully-relaxed, fullatom protein structures for normalization. As structure models in the initial phase of CS-Rosetta are entirely different compared to the optimized final models, the chosen set of constants and can give rise to large score values for these initial structural models. Furthermore, the absolute values of these scores depend on the size of the protein as well as the number of total constraints and can be dominated by outliers. Therefore, we only considered pearson , spearman and quadrant for optimizing the sPRE module, since those scores are based on correlation coefficients and resulted in a stable and robust folding algorithm. Moreover, the correlation-based scores can be utilized in scenarios involving different proteins, different sets of sPRE data or a mix of centroid and full-atom models.
In addition to the fixed and values, the score is scaled and shifted before it is returned to the Rosetta framework. This scaling step can be adjusted by the user and is performed according to sPRE score scaling • score offset (4) where score is the score as calculated above, scaling is given by the Rosetta command line option -score:spre:scaling (default 67), and offset is given by the option -score:spre:offset (default 0).
Note that the sPRE score (as any other scoring function used in Rosetta) is also scaled according to the Rosetta weight sets. The Rosetta weight for the sPRE score was set to 1.0 throughout the study. Furthermore, it should be noted that the Monte-Carlo algorithm of Abinitio relies on differences between scores. The offset was implemented only for the sake of completeness and chosen to be 0.

Verification of the sPRE back-calculation
The sPRE module uses a discretized, low resolution back-calculation of the sPRE data. To visualize the error of this approximated back-calculation, we compared the back-calculated sPRE data obtained from the Rosetta sPRE with data obtained from a classical grid-based back-calculation (Supplementary figure  7). As expected, the low resolution of the grid results in a weakening of the correlation. However, even with a grid of 2 Å, solvent exposed residues are still predicted to have a high sPRE, indicating that such low-resolution back-calculated data can still be used to guide the Rosetta sampling algorithm towards the native structure. Furthermore, assuming a global protein, the accuracy of the approximated sPRE back-calculation increases as the size of protein increases. For larger proteins, the effect of missing some high resolution structural features can be neglected compared to the large sPRE gradient between the core and the surface of the protein structure.

Optimizing the sPRE module
The sPRE scoring function can be directly adjusted using several parameters most notably, the resolution of the grid, the cut-off radius (integration radius) and the method for comparing the measured and the back-calculated sPRE data. An example showing how these parameters affect the scoring performance is illustrated in supplementary figure 8. In cases where the sPRE score is used in the Abinitio protocol of CS-Rosetta, two additional parameters become crucial, the global weight as well as a stage-specific weighting of the sPRE score. We optimized these parameters using a set of proteins (Supplementary table 6) and the recommended settings are listed in supplementary table 8.
The optimal values for the grid resolution and the cut-off radius can vary depending on the size of the protein, the quality of the sPRE data and the computational resources. Since both parameters affect the accuracy of the back-calculation as well as the computational costs, the optimal value is a trade-off between computational time and accuracy (Improving the resolution by lowering the distances between the grid points or enlarging the integration radius leads to a cubic increase of computational costs. For details see the section Computational costs). As a rule of thumb, a minimum value of 10 Å is required for the integrational cut-off radius since smaller values lead to a significantly reduced correlation between the sPRE score and the C α -RMSD to the native structure (Supplementary figure 8a). Increasing the integration threshold to more than 10 Å might be beneficial in the case of large proteins. However, given the current size limitations of CS-Rosetta, an integration threshold of 10 Å was sufficient throughout this work. Regarding the resolution of the grid, a grid spacing of 2 Å is sufficient in most cases and allows a fast computation of the sPRE score. Spacings above 2 Å lead to a significantly increased error and a broadening of the scoring correlation (Supplementary figure 8b). Higher resolved grids with a spacing of 1 Å or 0.5 Å increase the scoring performance in case where the score is used to distinguish between similar conformations with different high-resolution features (for example the tilting of the 4 helices in the case of C-terminal phpl5a, see figure 2d). In cases where the sPRE score guides the folding mainly in the early folding stages, an increase in resolution is typically not beneficial but requires more computational resources. In summary, using a grid spacing of 2 Å and an integration of 10 Å is a good compromise between performance and accuracy for most cases. To include high resolution information for near native-like structures, a grid spacing of 0.5 to 1 Å is preferable. To find the optimal method of comparison, as well as the optimal global and stage-specific weights we used the following approach. We first chose a set of small to medium-sized proteins (Supplementary  table 6) and predicted structure models using classical CS-Rosetta and sPRE-CS-Rosetta with different settings of the sPRE score. We then computed the percentage of models close to the experimental NMR structure (below 4 Å for 1Q02, 2JTV, 2JMB and 2CKX, below 1.5 Å for 2OSQ and 2K52) and used it as a measure for convergence for every parameter value. Using this procedure to optimize the global weighting, we found an optimum for the global scaling factor of 67 in our test set (Supplementary table  9). Although this default choice resembles a good compromise for many scenarios, it should be noted that the optimal scaling factor varies between proteins. Over-emphasizing the sPRE score can lead to physically incorrect structures, while a low weighted sPRE score fails to drive the sampling in a significant manner. The scaling of the sPRE score can therefore be adjusted by the user either by changing the score weight set within of the Rosetta framework or by changing the scaling command line option of equation (4).
We then used the same strategy to evaluate how different weightings in the individual Abinitio stages affect the sampling. In our test set, we found that every stage of the Abinitio protocol can benefit from the sPRE score (Supplementary table 10). Furthermore, depending on the protein, the improvement of the sampling can be traced back to different stages of the Abinitio protocol. Therefore, our tests suggest to include the sPRE score in all stages of Abinitio with a constant weight throughout the protocol. Among the five stages of Abinitio, stage I uses the simplest scoring function, solely based on a van-der-Waals term. We still chose to include the sPRE score in this early stage, as the sPRE score depends on the global fold and favors compact structures. It is therefore suited to collapse the initial extended chain which is the main purpose of stage I. Eventually we tested different algorithms to compare the experimental and the synthetic data using the same procedure. We found different types of correlation coefficients to perform best and to be the most robust among several common choices such as RMSD, correlation coefficient and Chi values (Supplementary table 7). Chi values and several variations of RMSD performed well only in a few test cases. The classical Pearson correlation coefficient performed considerably better, but was outperformed by the Spearman correlation coefficient which gave the best convergence in most cases (Supplementary table 11). Consequently, we chose the Spearman correlation as the default method to compare experimental and back-calculated sPRE data. The Spearman correlation coefficient is obtained by ranking both data sets (measured and backcalculated sPRE data) independently, and subsequently calculating a classical Pearson correlation coefficient of the ranks. As the ranks are bounded by the number of data points, the Spearman correlation is robust and less sensitive for outliers, making it well suited to cope with amide proton sPRE data that might contain additional relaxation contribution due to chemical exchange with water. Also note that due to the ranking of the input data, the sPRE module can potentially be used to include solvent accessibility data from different sources such as bioinformatics or other experimental methods such as mass spectrometry.

Computational costs
The main contribution of the overall computational effort is the back-calculation of the sPRE data. We therefore approximated the sPRE by discretizing the protein atom positions to positions of the same grid that is used to model the paramagnetic substance (see chapter Back-calculation of sPRE data). This simplifies the required computations to simple grid-based operations that can be accelerated by techniques such as lookup tables. We also aimed to reduce the total amount of memory to improve cache efficiency.
To quantify the computational costs of our sPRE score, we compared the runtime of rescoring an ensemble of ubiquitin structural models using the sPRE score with the runtime of computing the Rosetta centroid scores. As shown in supplementary table 12, the computational costs mainly depend on the resolution of the grid and the radius of integration. As an example, choosing a grid resolution of 1 or 2 Å requires an extra computational cost that is in the same order as required by the efficient centroid Rosetta scores (about 80% more computational time compared to only calculating a Rosetta centroid score). Furthermore, we compared the computational costs of CS-Rosetta and sPRE-CS-Rosetta (Supplementary table 13). The extra costs of the sPRE module do not change the order of magnitude of the total runtime when choosing a grid with 2 Å spacing (computational cost roughly doubles).
Reducing the grid to 1 Å requires about 5 to 6 times the computational time compared to a classical CS-Rosetta run. It should be noted that in this comparison, the number of computed structures was kept constant. In practice, using the sPRE score can dramatically speed up the complete procedure, as less models need to be computed to sample near-native conformations. The additional computational costs of the sPRE module become even less important considering that CS-Rosetta can easily by parallelized and the number of computational cores in modern clusters increases rapidly.

Scoring Benchmark
To evaluate the potential of the sPRE score, we performed a comparison between the common Rosetta scoring functions and the sPRE score. In particular, we did not limit the benchmark to fully-relaxed fullatom structures, but we also analyzed how the sPRE score performs in the case of centroid structure models since those simplified models are used to fold the extended chain in the Abinitio protocol.
In a first step, we chose a set of proteins for which the native structure was determined by NMR spectroscopy and experimental sPRE data was either measured or already available in (Supplementary  table 1). For every protein, a test ensemble of structures was generated by starting classical CS-Rosetta structure prediction runs and collecting the centroid models at the end of each stage (stage I, II, III and IV) as well as the final full-atom structures.
To ensure that the ensemble covers a broad RMSD range, from only partially-folded proteins to near native structures, we added distance restraints that were derived from the native structure. Gradually improved structures were obtained by running several CS-Rosetta runs and narrowing the distance potential in steps of 10, 6, 4, 3, and 2 Å. In total, we generated 15000 models for every protein and for every stage (3000 per protein, Abinitio stage and distance potential window size). These ensembles were then scored using the corresponding centroid Rosetta score (score0 for stage I, score1 for stage II, score2 for stage III and score3 for stage IV) and the sPRE score. The chemical shift score was only used for fully-relaxed structures, as the score is only applicable to full-atom models.
The results of the scoring benchmark clearly suggest that the sPRE score can be used to find nearnative structures (Supplementary figure 1). In particular in the case of centroid models, the Rosetta scoring function in some cases prefers wrongly folded models over near-native structures. For these cases, we observed that the sPRE score outperforms the Rosetta score (see for example supplementary figure 1a and b). On the other hand, for full-atom models the Rosetta and in particular the chemical shift score are more reliable and the performance of these scores is similar to that of the sPRE score.
Although the sPRE score mainly depends on the global fold properties with only minor contributions from local high-resolution structural features, in some cases the sPRE score performs as well as the chemical shift score and outperforms the Rosetta full-atom score even in the low RMSD range (see for example supplementary figure 1c, g and h). Moreover, considering that in some cases only sPRE data for amide protons was used, it is interesting to note that in our test set we never observed the sPRE score to perform worse than the Rosetta centroid score.
In summary, the scoring benchmark revealed the potential of the sPRE score in finding native-like structures and consequently suggests the score to be perfectly suited to improve sampling and thus the overall performance of CS-Rosetta.

Sampling Benchmark
To study the benefit of including solvent accessibility data into the folding algorithm of CS-Rosetta, we built a test set of 49 proteins by randomly selecting protein models of the protein data base [2] (PDB) with a protein core size up to 170 residues and for which chemical shift data is available (Supplementary  table 2). A full set of synthetic carbon and proton sPRE data was back-calculated using the lowestenergy model of the submitted structure in the PDB. We then predicted models using classical CS-Rosetta as well as sPRE-CS-Rosetta. For both methods, the obtained structure ensembles were ranked according to the sum of chemical shift score and Rosetta full-atom score (score13_env_hb) and the average C α -RMSD of the best ranked 0.2% was computed (Figure 2c). To solely address the sampling of both methods, we also compared the best 1% by C α -RMSD (Supplementary figure 4b). Proteins, for which both methods fail (average C α -RMSD > 10 Å) where not analyzed. As indicated by the scoring benchmark, the additional solvent accessibility data significantly improved the sampling compared to classical CS-Rosetta.
To evaluate if the observed benefit is also present when using experimental NMR data, we repeated the sampling benchmark using a set of proteins for which experimental sPRE data is available (Supplementary table 1). We again predicted models using both methods and computed the average C α -RMSD of the best ranked models by the sum of the chemical shift score, the Rosetta full-atom score (score13_env_hb) and the sPRE score ( Figure 2b) as well as by the C α -RMSD (Supplementary figure 4a). Although both methods failed to predict reasonable folds in the case of MBP and p16, the experimental sPRE data significantly improved sampling in the case of Pex19, Ubiquitin and both domains of Phl p 5a. In the case of Protein A, both methods resulted in high-resolution models.
Since the previous sampling benchmarks clearly showed that sPRE data improves the convergence and accuracy of CS-Rosetta, we used 4 proteins (2LEJ, 1LS4, 1P6T and 1Z8S) to quantify the robustness and applicability of sPRE-CS-Rosetta regarding typical challenges in protein NMR spectroscopy. For this benchmark, we first back-calculated a full sPRE dataset as described before. We then generated different sets of sPRE data by simulating incomplete assignments (40%, 70% and 100% assigned), different noise levels (30%, 60%, 100%, 200% and 400%) and different atom subsets (H N only, H N and H methyl-ILV as well as H N , H α and H β ). For a comparison of the simulated noise with experimental sPRE data see supplementary figure 9. Next, structure ensembles were predicted using sPRE-CS-Rosetta for every sPRE data set and the percentages of models with an C α -RMSD of 5 Å or less to the native structure were computed for every ensemble as well as for the reference CS-Rosetta ensemble (Supplementary tables 3a-d). Interestingly, this sampling benchmark revealed the robustness of the sPRE score and clearly suggests its applicability to sparse and erroneous experimental NMR sPRE data.

Sampling Benchmark using NOE and RDC data
To show the orthogonality of the sPRE score with other experimental NMR data, the structure of ubiquitin was predicted using CS-Rosetta and sPRE-CS-Rosetta in the absence and presence of additional NMR restraints such as RDCs and NOEs (see supplementary table 4a-b and 5 as well as supplementary  figure 5).
To this end, experiment NOEs and RDCs for ubiquitin (1 set of H N -N RDCs recorded in one medium) were obtained from the literature (PDB entry 1D3Z). Next, random subsets of either NOE or RDC data were generated with a varying number of total restrains in the sets (see supplementary table 4a and 5).
Ambiguous NOE restraints were counted as a single restraint and the AmbiguousRestraint groups of the Rosetta framework were used to account for the ambiguity. For every RDC subset, the CS-Rosetta toolbox (http://csrosetta.chemistry.ucsc.edu/) was used to prepare the RDC data for the usage in the Rosetta framework. For every NOE or RDC subset, CS-Rosetta and sPRE-CS-Rosetta runs were used to obtain ensembles of ubiquitin with 5000 models each. For every subset size, 2 to 4 different random subsets were generated and used as input to account for random effects of the selection process (see supplementary tables 4a and 5). To compare the performance of the different input sets, the percentage of models with C α -RMSD of 1.0 Å or less was computed for every ensemble. As the results show, the sPRE module improves the sampling of CS-Rosetta in all cases, even when using large sets of NOEs restraints. In addition, the percentages of wrong models (C α -RMSD above 4.0 Å) were analyzed using the same data set (see supplementary table 4b).
The results show that adding the NOE scoring functions which consist of several thousands of NOEs not only generates more high-resolution structures, but also increases the percentage of models far from the native structure. This can be explained by the fact that such a NOE score containing a large number of restraints can become rather complex and therefore harder to sample efficiently. On the other hand, the sPRE score depends on the global fold and as such is less prone to rapid change upon minor conformational changes. With a smoother energy landscape, the sPRE score can in particularly drive the sampling from far off models to near native-like models. This can be seen by a reduction of models with a C α -RMSD of 4 Å or more.

Comparison with RasRec
To compare the performance of CS-Rosetta and sPRE-CS-Rosetta to the iterative Rosetta protocol RasRec, [3] the Rosetta toolbox (http://csrosetta.chemistry.ucsc.edu/) was used to setup RasRec-Rosetta runs. The corresponding amino acid sequence as well as the chemical shift data as listed in supplementary table 1 were used as input data for the RasRec runs. The pool size of the RasRec protocol was increased to 1000 while all other settings were left as default. The obtained full-atom models were rescored using the same procedure as for ensembles obtained with CS-Rosetta and sPRE-CS-Rosetta. The results are compared in supplementary figure 6.

Protein expression and purification
For the expression of protein A and p16, a pET-M11 vector was modified to express the protein A as a solubility tag for p16 expression. The vector contains an N-terminal hexa-histidine sequence followed by protein A, a TEV (tobacco etch virus) cleavage site and the E. coli codon optimized DNA sequence of human p16. After cleavage by TEV protease, the protein A domain includes 16 N-terminal residues (MKHHHHHHPMKQHDEA) and an unstructured C-terminal region of 15 residues including the remaining cleaved TEV-site (MDAGSGSGSENLYFQ). The cleaved p16 protein contains two additional N-terminal residues (GA) followed by its 156 amino acids (canonical sequence, isoform 1). The expression vector was transformed into E. Coli Bl21 (DE3) and cells were grown at 37 °C, using 50µg/ml of kanamycin for selection. After inoculation of 150 ml of M9 minimal medium including uniformly 13 C labeled glucose (3 g per liter) and uniformly 15 N labeled ammonium chloride (1 g per liter), the culture was grown over night while vigorous shaking. In the next morning, the cell suspension was diluted with 850 ml of the same medium and grown to an OD of 0.8 and protein synthesis was induced by addition of IPTG (isopropyl-1-thio-D-galactopyranoside) to a final concentration of 0.5 mM. Then the culture was incubated over night at 19 °C and harvested on the next day. The cell pellets were re-suspended in 30 ml purification buffer (8 M urea, 20 mM TRIS, pH 8.0 and 20 mM Imidazole) and frozen at -20 °C. For purification of the protein, the cell pellet was thawed at room temperature, sonicated and applied to a Ni-NTA agarose (Qiagen) gravity column following the manufacturer's instruction. The gravity column with the bound protein was then washed with 50 ml urea buffer. Afterwards, the buffer was exchanged to HEPES buffer (110 mM potassium acetate, 20 mM HEPES (4-(2-hydroxyethyl)-1piperazineethanesulfonic acid), pH 8.0, 2 mM β-mercaptoethanol (BME), 5% (v/v) glycerol and 20 mM imidazole) by washing of the column with 20 ml. The protein was then eluted with HEPES buffer including 250 mM imidazole and concentrated to 5 ml in a centrifugal filter unit (Amicon Ultra-15 (Millipore, 3kDa molecular weight cut-off) and applied to size exclusion chromatography. After loading on a HiLoad 16/600 Superdex 75 pg (GE Healthcare Life Sciences, 50 mM sodium phosphate, 500 mM NaCl, 2 mM BME, pH 6.0) the target-protein containing fractions were pooled. The sample was dialyzed over night at 4°C against HEPES buffer using a 2 kDa MWCO ZelluTrans V series membrane (Carl Roth) after addition of 400 µl of a 0.1 mg/ml 6xHistidine tagged TEV protease solution. Next day, the solution was applied again to a Ni-NTA agarose column to separate the cleaved p16 while the TEV protease, traces of uncleaved protein and the protein A remained bound to the column. The p16 flow through fraction was buffer exchanged into a HEPES buffer (4 mM HEPES, pH 7.5, 5 mM DTT) by using a 5ml highTrap desalting column (GE Healthcare Life Sciences) and the final concentration for NMR measurements was 150 µM (including 10% D2O). The protein A fraction was again eluted by HEPES buffer containing 250 mM imidazole and concentrated to 5 ml and a second size exclusion step was performed as described above which allowed the separation of the pure protein. Protein A was concentrated to 500 µl and buffer exchanged into NMR buffer (20 mM potassium phosphate buffer at pH 6.5, 50 mM NaCl) using the desalting column. NMR spectroscopy

Recording of sPRE data
To obtain sPRE data by NMR spectroscopy, we used a saturation-based approach as described previously. [1b] Briefly, the R1 relaxation rates are determined by a saturation-recovery scheme followed by a read-out experiment such as a 1 H, 15 N HSQC, 1 H, 13 C HSQC or a 3D CBCA(CO)NH experiment. For proton saturation, a 7.5 ms 1 H trim pulse followed by a gradient was applied. Then, z-magnetization is build up during the recovery delay, ranging between several milliseconds up to several seconds. Iterating through the different recovery delays is done in an interleaved manner, and short and long delays were ordered in an alternating fashion. For every R1 measurement at least 8 delay times were recorded and for error estimation, at least one delay time was recorded as a duplicate. The measurement of R1 rates was repeated for increasing concentrations of the relaxation-enhancing Omniscan and the sPRE was obtained as the average change of the proton R1 rate per concentration of the paramagnetic agent. After every addition of Omniscan, the recovery delays were shortened such that for the longest delay all NMR signals were still sufficiently recovered. The interscan delay was set to 50 ms, as the saturation-recovery scheme does not rely on an equilibrium z-magnetization at the start of each scan. All NMR samples contained 10% 2 H2O. Spectra were processed using NMRPipe [5] and analyzed with the NMRView [6] and CcpNmr Analysis [7] software packages.

Required measurement time
To record a full set of sPRE data for H N and H aliphatic protons about 2 to 5 days of measurement time is required when using 1 H, 15 N and 1 H, 13 C HSQC-based pseudo-3D relaxation experiments. Acquiring relaxation rates for 4 to 6 different concentrations of the paramagnetic agent is sufficient for most proteins.
For example, using a 400 µM sample of p16 (16.6 kDa) and a 750 MHz magnet equipped with a TXI probe head (Bruker), one set of relaxation rates was measured in 7 to 8 hours (3.5 hours for a pseudo-3D 1 H, 15 N HSQC with 8 scans,100 complex points and 12 exponentially-spaced delay points as well as 3 hours for a pseudo-3D 1 H, 13 C HSQC with 4 scans, 175 complex points and 12 exponentially-spaced delay points). Using more scans for overnight experiments, relaxation rates for 6 different concentrations of the paramagnetic agent can be acquired in 3 days. The total measurement time of sPRE data for a 300 µM sample of Phl p 5a (24.1kDa) using pseudo-3D 1 H, 13 C HSQC required 22 hours of measurement time. The data was acquired on an Avance III Bruker 700 MHz NMR spectrometer using 4 scans, 128 complex points and 12 exponentially-spaced delay points for 5 different concentrations of the paramagnetic agent.

Measurement of sPRE data used in this study
Details of the assignment and acquisition of sPRE data for MBP and Ubiquitin were published previously [1b] and sPRE data of Pex19 was obtained according to the same protocol. The assignment of protein A was achieved by transferring the published chemical shift data of BMRB entry 4023 [8] and confirmation of the resonance positions by acquiring HNCO, HNCACO and HNCACB experiments. sPRE data of a uniformly 13 C, 15 N labeled 1 mM sample of the Z domain of protein A was recorded on a 600 MHz magnet (Oxford Instruments) equipped with an AV III console and cryo TCI probe head (Bruker). R1 rates of H α and H β were measured using CBCA(CO)NH read-out spectra at 25 °C in the presence of 0, 0.5, 1, 2, 5 and 10 mM Omniscan (GE Healthcare, Vienna, Austria). For the assignment of p16, previously reported chemical shifts of p16 [9] were obtained from the BMRB [4] (accession code 4086) and the assignment was confirmed by recording backbone HNCA as well as sidechain (H)CCH and H(C)CH tocsy spectra of uniformly 13

Analysis of NMR data
Analysis of sPRE data for MBP and Ubiquitin was described previously [1b] and sPRE data for Pex19 was analyzed accordingly. For p16, Phl p 5a and protein A, the sPRE data was analyzed as follows. Peak intensities were extracted using the nmrglue [10] Python package and fitted to a mono-exponential build up curve using the SciPy python package and equation (5) I where I is the peak intensity of the saturation-recovery experiment, is the recovery delay, is the amplitude of the z-magnetization build-up, is the plateau of the curve and is the longitudinal relaxation rate. To estimate the error for the fitted rates , the experimental error was estimated using duplicate recovery delays. For every R1 experiment, one absolute error for all peaks exp was obtained by equation (6) exp where is the number of peaks in the spectrum, is the index of the peak, and is the difference of the duplicates for the -th peak. The error of the rates was then obtained using a Monte Carlo-type resampling strategy. By randomly drawing 3 • from the pool of unique recovery delays, a new data set was created. Then noise was added to the peak intensities for each of the 3 • data points, according to a normal distribution with a standard deviation of exp . For every peak and saturation-recovery experiment, 1000 of such data sets containing 3 • randomly altered data points were created and fitted to the saturation recovery model as described by equation (5). The standard deviation ∆ of all 1000 fitted parameters was then used as the error of . The sPRE is then obtained by performing a weighted linear regression using equation (7) sPRE • where is the concentration of Omniscan, is the fitted rate at the present of Omniscan with a concentration , is the in the absence of Omniscan and sPRE is the slope and the desired sPRE value. For the weighted linear regression, the previously determined errors ∆ for was used, and the error of the concentration was neglected.  Ensembles of different proteins (a-i) have been generated and rescored using different scoring functions. Plots show 2D-histograms of the score and the C α -RMSD to the native structure, with red corresponding to a high sampling density and dark blue corresponding to single structures. The ensembles contain centroid and full-atom models representing different stages of Rosetta's AbinitioRelax protocol (see column headers). Centroid models for Stages 1-4 were rescored using the corresponding Rosetta centroid score score0-3 (orange axis), the sPRE score (blue axis). Full-atom models were rescored using the Rosetta score score13_env_hb (orange axis), the chemical shift score (black axis) and the sPRE score (blue axis). Experimental sPRE data was used as listed in supplementary table 1.  Plots show 2D-histograms of the Rosetta score (score13_env_hb, in arbitrary units) and the C α -RMSD to the native structure, with red corresponding to a high sampling density and dark blue corresponding to single structures. 5000 models are shown for CS-Rosetta and sPRE-CS-Rosetta, 1000 models for RasRec-Rosetta. Experimental data was used as input for sPRE-CS-Rosetta.

Supplementary Tables
Supplementary