Protein-design methodology has been used to create new protein structures,1, 2 redesign protein-protein interactions,3, 4 and produce new enzymes.5, 6 Although there is still more to improve in protein-design algorithms,7 the field has considerable promise for generating new molecules for use in a wide variety of areas of current interest. By providing a reference point for which all of the inputs are known, protein design can also help to cast into sharper focus the critical evolved properties of native biomolecules. For example, studies of the kinetics of folding of designed proteins have helped distinguish the features of folding that are simple consequences of having a unique folded state from those optimized by natural selection.8
To gain insight into the properties of native structures and interfaces that result from more than simple selection for very low-energy native states, we used computational-design methodology to generate designed structures, protein-protein interfaces, and protein-ligand interfaces. As shown in Figure 1(a), designed protein-protein complexes have computed binding energies in the same range as those of native ones. Though the errors in computed energies are likely to be at least several kcal/mol, the computations do capture overall trends reasonably well; for example, the correlation between experimentally measured binding affinities and computed binding energies for protein-protein complexes [Fig. 1(b)]. Thus, although the errors in the computed energy of any one design may well be sufficiently large for the designed protein to not fold or the designed interface to not form, the distribution of interactions overall in the native and designed complexes are likely to be similar.
With the initial aim of estimating the sidechain-entropy loss upon protein-complex formation, we developed a simple measure of the Boltzmann weight of sidechain conformations in native and designed interfaces. For protein interfaces, the method first separates the complex, and for each residue that makes an appreciable contribution to binding (see Methods section), iterates over all of its rotameric states as defined in the Dunbrack library of backbone-dependent rotamers,10 excluding rotamers that are predicted to clash with protein mainchain or Cβ atoms. For each rotamer placement, all residues within a 6 Å shell are repacked and minimized. The energy E of each such state is then evaluated using the Rosetta all-atom energy function,11 which is dominated by van der Waals, hydrogen bonding, and solvation terms. The probability of rotamer i, pi is then computed assuming a Boltzmann distribution:
where s is the rotameric state, kB is the Boltzmann constant, and T is the absolute temperature. E is the energy of the states.
We estimated the probabilities in the unbound state of aromatic sidechain conformations observed in bound complexes and designed binders according to Eq. (1) [Fig. 2(a,b)]. Both designs and native complexes have low-probability sidechains at the interface that contribute significantly to the binding energy. However, very high-probability conformations (p > 50%) are almost exclusively found in natives, whereas designs are over-represented in the very low-probability bin (p < 5%). The trend extends to monomeric structures: we observe higher probabilities for aromatic residues in native conformations than in designs [Fig. 2(c)]. Thus, although the overall computed energies of the designed structures and complexes are similar to those of native structures, aromatic sidechains in the former are considerably less restricted in conformation than those in the latter. We found no correlation between the computed stability of individual residues in the cores of designed and native monomers and their conformational probability (not shown). The lack of correlation demonstrates that the conformational restrictions we observe in native proteins compared to designs are not a simple consequence of natural proteins having more stable cores than designs.
What is the origin of these differences in sidechain-conformational restriction? The answer may lie in the fact that most current design methods, including those used to generate the structures and complexes in this study, focus on optimizing the energy of the desired target structure or complex. Such “positive design” strategies suffer from the flaw that even though the energy of the target structure is optimized to be extremely low, the target conformation or complex might be high in energy relative to alternative conformations, and hence could be poorly populated.12 Negative-design methods have been developed that contain explicit bias for sequence substitutions that favor a target structure versus a set of alternative structures,13, 14 but the latter have to be explicitly modeled and thus for computational tractability the set must not be too large.
A likely explanation for the differences in sidechain flexibility between the designed and native structures and complexes is that evolutionary pressures on native structures involve aspects of negative design. Functioning cell circuitry, for example, requires not only that binding proteins bind their targets with high affinity, but just as importantly, that they not bind other cellular proteins, which will almost always be in vast excess over the former.15 Likewise, enzymes and ligand-binding proteins must bind the small molecules relevant to their function but not other small molecules present in the cell. From a free-energy standpoint, the preordering of sidechains in unbound complexes is beneficial as it reduces the loss of entropic degrees of freedom upon binding. The interactions in monomeric proteins must stabilize the folded state, but not the much larger number of non-native states.
How can evolutionary pressure accomplish this high selectivity for native structures and interactions? To gain insight into this issue, we investigated the structural contexts of the low-probability sidechains in designed structures and complexes and the high-probability sidechains in native conformations (Supporting Information Figures). The computed energies of these low- and high-probability sidechains are all very favorable. We found that designed low-probability aromatic sidechains tend to be exposed in the unbound state and make few if any interactions. In contrast, native high-probability sidechains interact with several residues within the host monomer and are often conformationally restricted by contacts with backbone or Cβ atoms. Native proteins appear to have dense interaction networks between nearby sidechain and backbone atoms that favor native conformations over non-native conformations and are not found in isoenergetic designed conformations. Evolution appears to have used negative design to ensure that bulky aromatic residues are constrained to very specific conformations within the cores of monomers and at interfaces.
Our results highlight an important aspect of evolved native structures—the conformational restriction of large sidechains to disfavor non-native interactions—and have immediate relevance to protein design. The reference set of designed structures is necessary in this case to show that conformational restriction is not a simple consequence of selection for a very low-energy structure, which would otherwise have been a quite reasonable assumption. Further comparison of the properties of native and designed complexes will undoubtedly continue to reveal the complex and intricate features engrained in native structures and complexes by evolution. On the design side, the sidechain conformational-probability metric provides a computationally tractable alternative to explicit negative design, which as noted above, becomes prohibitive when the number of alternative structures is large. The sidechains in designs can be assessed using the metric, and overly flexible aromatic sidechains can be restricted by incorporating rigidifying interactions. The metric can also be used to guide the development of design strategies which produce more conformationally restricted sidechains by construction.