There is a fundamental conflict between two different views of how proteins fold. Kinetic experiments and theoretical calculations are often interpreted in terms of different population fractions folding through different intermediates in independent unrelated pathways (IUP model). However, detailed structural information indicates that all of the protein population folds through a sequence of intermediates predetermined by the foldon substructure of the target protein and a sequential stabilization principle. These contrary views can be resolved by a predetermined pathway—optional error (PPOE) hypothesis. The hypothesis is that any pathway intermediate can incorporate a chance misfolding error that blocks folding and must be reversed for productive folding to continue. Different fractions of the protein population will then block at different steps, populate different intermediates, and fold at different rates, giving the appearance of multiple unrelated pathways. A test of the hypothesis matches the two models against extensive kinetic folding results for hen lysozyme which have been widely cited in support of independent parallel pathways. The PPOE model succeeds with fewer fitting constants. The fitted PPOE reaction scheme leads to known folding behavior, whereas the IUP properties are contradicted by experiment. The appearance of a conflict with multipath theoretical models seems to be due to their different focus, namely on multitrack microscopic behavior versus cooperative macroscopic behavior. The integration of three well-documented principles in the PPOE model (cooperative foldons, sequential stabilization, optional errors) provides a unifying explanation for how proteins fold and why they fold in that way.
Abbreviations: HEWL, hen egg white lysozyme; Cyt c, cytochrome c; HX, hydrogen exchange; GdmCl, guanidinium chloride; IUP, independent unrelated pathways model; PPOE, predetermined pathway–optional error model; U, I, and N, unfolded, intermediate, and native states; Iix, kinetically blocked intermediate.
Detailed structural information obtained from hydrogen exchange (HX) studies of cytochrome c (Cyt c) and related studies with other proteins support a classical view of protein folding in which all of the molecules in a refolding population fold through essentially the same intermediate structures (Chamberlain and Marqusee 2000; Englander 2000; Maity et al. 2005). The pathway constructs the native protein by the stepwise addition of the cooperative folding units of the native structure, called foldons. The order of pathway steps is guided by a sequential stabilization process in which prior native-like structure templates the formation of subsequent complementary structure. Thus the folding pathway is determined by the same cooperative interactions that determine the target native structure. The pathway may be linear or may branch at any step as directed by the logic of the sequential stabilization process (Krishna et al. 2006).
By contrast, other experiments reveal a wide range of kinetic protein folding behavior that has been interpreted quite differently, in terms of independent unrelated pathways (IUP model). Proteins can fold in a simple two-state way with only the unfolded (U) and native (N) states significantly populated, or in a multistate way with one or more intermediates (I) transiently populated, or heterogeneously with some population fraction folding in a two-state way and others in a three-state way. Common conclusions are that specific intermediates need not exist (fast two-state folding), that populated intermediates are detrimental (slower multistate fractions), and that different population fractions fold through independent parallel pathways (heterogeneous folding). These varied behaviors have been described mainly by spectroscopic methods which can track kinetic folding in real time but provide almost no structural detail. Nevertheless, these views draw credibility from theoretical calculations that often portray folding in terms of different population fractions traversing different parts of the energy landscape, generally interpreted as multiple independent pathways (Baldwin 1995; Bryngelson et al. 1995; Wolynes et al. 1995; Dill and Chan 1997; Brooks 1998; Plotkin and Onuchic 2002a,b).
The choice between the predetermined pathway view and the multiple independent pathway view is fundamental to the protein folding problem. These grossly discrepant interpretations of accepted experimental results can be resolved by modifying the predetermined pathway model to include probabilistic misfolding errors that block the forward progress of normally occurring intermediates (Sosnick et al. 1994; Krishna et al. 2004). We term this the predetermined pathway–optional error (PPOE) model. Chance misfolding errors can act to corrupt different naturally occurring intermediates and insert optional error-repair barriers at different points in a pathway. When the error probability is zero at all pathway steps, folding appears to be a two-state process. When it is unity at one particular step, three-state folding occurs. Any other values or combinations will produce mixed behavior in which different population fractions display different naturally occurring but partially corrupted intermediates, or none at all, and fold at different rates. This heterogeneous behavior, when detected by the usual spectroscopic observations of kinetic phases, will appear to represent multiple alternative pathways.
The optional error hypothesis is grounded in the observation that folding errors are ubiquitous. Well-known misfolding errors include prolyl and non-prolyl peptide bond misisomerization, transient aggregation, nonnative hydrophobic clusters, disulfide shuffling, heme misligation, and perhaps nonnative domain docking modes. These errors are optional, not intrinsic to the folding process, and they can often be inserted or removed by the manipulation of folding conditions. In vitro, misfolding-dependent aggregation obstructs laboratory and industrial protein expression (Chi et al. 2003). In vivo, folding errors appear to result in the loss of 30% or more of synthesized polypeptides (Yewdell 2005), and they account for a large fraction of human pathologies. In response, biology finds it cost-effective to elaborate multiple helper proteins, error repair systems, and turnover machineries.
As one test of the PPOE model, we re-examine results for the highly studied hen egg white lysozyme (HEWL) protein. Dobson and Radford and their coworkers (Dobson et al. 1994) found that HEWL folds in a three-state way so that a transient intermediate accumulates. The HEWL intermediate that has been most thoroughly studied contains native-like structural elements but also some seriously misfolded structure. In comprehensive experiments and analysis, Kiefhaber and coworkers found and quantified additional HEWL folding phases and intermediates, and they matched possible folding models against their extensive kinetic data (Kiefhaber 1995; Kiefhaber et al. 1997; Wildegger and Kiefhaber 1997; Bieri et al. 1999; Bieri and Kiefhaber 2001). All of these results have been interpreted in terms of multiple independent parallel pathways, as has been done also for many other proteins.
We find that the PPOE model quantitatively accounts for the measured folding and unfolding of the lysozyme intermediates and native state and their dependence on denaturant with fewer fitting parameters than the multiple pathway models considered before. More importantly, the fitted PPOE model exhibits folding properties that are common to proteins very broadly, whereas the fitted IUP model exhibits properties that are contradicted by experiment. The PPOE formalism seems able to explain the varied folding behavior of proteins quite generally.
Hen egg white lysozyme (HEWL) is a 129-residue protein stabilized by four disulfide bonds. It has two domains, rich in α and β structure, respectively. When HEWL is unfolded in concentrated denaturant and then diluted into folding conditions, it rapidly collapses (<1 msec) to form a compact C state (Chaffotte et al. 1992; Itzhaki et al. 1994; Segel et al. 1999), evidently dependent on its disulfide cross-links since the disulfide-reduced form (Roux et al. 1997) and proteins more generally (Plaxco et al. 1999; Jacob et al. 2004) do not collapse in this way. The chain collapse may or may not reflect a distinctly structured intermediate; hydrogen exchange protection seems to occur only at residues closely adjacent to the pre-existing disulfide bridges (Gladwin and Evans 1996; see also Goldberg and Guillou 1994; Roux et al. 1997).
The major folding sequence then proceeds from the C state. Spectroscopic experiments show that an intermediate, called I, forms and accumulates (∼30 msec), and then folds more slowly (∼400 msec) to a nearly native N′ state (Dobson et al. 1994; Kulakarni et al. 2006). Finally, the native substrate binding site forms in an optically silent N′-to-N step (Kulakarni et al. 2006). Like other workers before, we focus on the major folding behavior that carries the protein from the early C state to the near-final N′ state, which is pertinent to the folding problem most generally.
The hypothesis pursued here is that proteins fold through predetermined native-like intermediates that transiently accumulate only when their forward progress is blocked by an optional misfolding error (Sosnick et al. 1994; Englander et al. 1998; Krishna et al. 2004). The following discussion formulates an “error” as a step that removes the protein from the productive folding pathway and must be reversed for forward folding to resume. As expected from this hypothesis, the populated HEWL intermediate has much native-like structure, seen by HX pulse labeling (Radford et al. 1992; Miranker et al. 1993), but also some obvious misfolding (Fig. 1). The intermediate has 1.5 times the CD225 of the N state (Dobson et al. 1994), lower and blue-shifted fluorescence compared to both U and N (Segel et al. 1999), some proline misisomerization that affects late-stage kinetics (Kiefhaber 1995), and a nonnative tryptophan cluster (Rothwarf and Scheraga 1996; Klein-Seetharaman et al. 2002). The blocking error is clearly optional. The deletion of one disulfide bond removes the nonnative CD and fluorescence (Eyles et al. 1994); the slow phase can be relieved, at least in part, by destabilizing the nonnative tryptophan cluster (Rothwarf and Scheraga 1996), and a significant fraction of the refolding population (∼15%) folds in a fast two-state manner with no blocking at all (Kiefhaber 1995; Wildegger and Kiefhaber 1997).
These results are as expected from the PPOE model, but they do not provide a quantitative test of the hypothesis. We therefore turn to an analysis of the extensive and coherent data sets for HEWL folding and unfolding collected by the Kiefhaber group (Kiefhaber et al. 1997; Wildegger and Kiefhaber 1997; Bieri et al. 1999; Bieri and Kiefhaber 2001). First we compare the measured folding and unfolding data to the multipath IUP and PPOE reaction schemes (Fig. 2). As for any kinetic process, the measured macroscopic rates can represent complex mixtures of the microscopic rate constants that connect the various states in ways that depend critically on the reaction scheme considered. The usual kinetic fitting exercise is intended to match alternative reaction schemes against the data, rule out those that do not fit, and find one that does, ideally with the fewest fitting parameters.
The Triangular model: Two independent folding pathways
Kiefhaber and coworkers (Kiefhaber 1995; Wildegger and Kiefhaber 1997) showed that the kinetics of fluorescence-detected HEWL folding includes a small faster phase (∼15%) that carries C to N′ without the population of an intermediate state, in agreement with the fast phase seen by Radford et al. (1992) in HX pulse-labeling experiments. In an extensive analysis, Kiefhaber and coworkers (Wildegger and Kiefhaber 1997) used stopped-flow fluorescence experiments to measure folding and unfolding rates of HEWL as a function of denaturant (chevron curves in Fig. 3A). They also measured the time-dependent populations of I and N′ at one GdmCl concentration (0.6 M GdmCl) (Fig. 3B) by the interrupted refolding method (start folding, delay, unfold at higher denaturant, measure the rate and amplitude of each unfolding phase) (see Schmid 1983 for experimental details).
Wildegger and Kiefhaber (1997) then considered possible folding models that could explain all of these results. Several considerations led to a Triangular model with folding through two independent parallel pathways, C to N′ and C to I to N′, written as Scheme 1 (Fig. 2) in terms of U, I, and N for generality. The larger population fraction (85%) folds in a three-state manner, U to I to N. In an independent parallel pathway, the faster fraction folds directly to N in a two-state manner (U to N).
The three-species Triangular model predicts the presence of two kinetic phases, shown in the form of chevron plots in Figure 3A. The excellent kinetic fitting to the chevron data found by Wildegger and Kiefhaber is indicated by the red curves (data and fit from Fig. 3 of Wildegger and Kiefhaber 1997). This set of fitted rate constants matches the time-dependent population data less well (Fig. 3B; data from Fig. 1 of Wildegger and Kiefhaber 1997). We refit the Triangular model to the data globally. The modified rate constants (Fig. 3C, m-values in parentheses) yield a good fit to both data sets (black in Fig. 3A,B). Global χR2 falls from 42 to 1.6. Four of the six rate constants are well constrained by the implicitly measured mass flow at critical pathway points.
The PPOE model: A predetermined pathway with optional errors
The PPOE hypothesis leads to a different kind of model. In the PPOE model, proteins fold through a predetermined sequence of native-like intermediates that can be corrupted and transiently blocked by chance misfolding errors. Given the obvious misfolding of the populated lysozyme intermediate (Fig. 1), other workers have previously considered that its forward folding might be rate-limited by slow error repair (Dobson et al. 1994; Eyles et al. 1994; Rothwarf and Scheraga 1996; Matagne et al. 1997, 1998; Klein-Seetharaman et al. 2002).
A fairly general PPOE model is shown as Scheme 4 (Fig. 2). The minimal PPOE pathway diagrammed as the T model in Scheme 5 (Fig. 2) can serve to illustrate the issues. Here the pathway is represented by a single normally occurring on-pathway intermediate, I, and a corrupted Ix form that has in addition some misfolding. The probability for error formation is kIIx/(kIIx + kIN + kIU). If the error-repair step required to return Ix to the productive pathway is slow, Ix will accumulate and kinetic folding will appear to be three-state.
The T model provides an excellent fit to all of the HEWL data (solid lines in Fig. 3D,E) with the same number of fitting rate constants required for the Triangular model (six) and an equivalent χR2. The three rate constants that lead to I (boldface in Fig. 3F) and their m-values (in parentheses in Fig. 3F) are fully constrained because they are dominated by well-measured folding and unfolding rates (Fig. 4B). The others are underdetermined and are free to vary over a wide range without degrading the goodness of fit. Only their ratios are constrained. They control the partitioning of I into three competing reactions.
The T model has four species including the hidden on-pathway I state, and therefore predicts the presence of three kinetic phases. The additional third phase is described by the underdetermined rate constants, as shown by the dotted lines in Figure 3D obtained from multiple independent fittings. It is very poorly determined by the kinetic data because the on-pathway uncorrupted intermediate goes forward rapidly, barely accumulates, and is hardly detectable. The kinetic parameters for the hidden on-pathway intermediate could be specified by additional unfolding experiments (see below).
The rates predicted by both models perfectly match the measured data. Both produce the same total flux away from U, the same flux into their respective populated intermediate, the same flux away from N, and also the same S value (relative fluorescence signal; see Materials and Methods) for the intermediate. Figure 4 shows how the measured macroscopic relaxation rates (data points) compare with the microscopic rate constants (dashed lines) calculated for the two different models.
Rejection criteria and the Triangular model
The two-pathway Triangular model seemed to be required by three qualitative constraints (Wildegger and Kiefhaber 1997).
(1) The major constraint that seemed to require independent parallel pathways relates to the absence of a lag phase for reaching N. (The upsweep in N formation in Fig. 3B is due to the logarithmic time scale and not to a kinetic lag.) If an intermediate accumulates and then folds to N in a sequential pathway (U to I to N), N will form with a kinetic lag that keys to the time scale for populating I. The two-pathway Triangular model accounts for the absence of a lag in N formation by the fast formation of N in a parallel U to N pathway with sufficient flux. Kinetic coupling due to the fact that both pathways deplete the pool of U ensures that both of these steps occur on the same time scale.
However, the absence of a lag can also be satisfied by a single pathway that contains an optional off-pathway misfolding-repair step (T model), as described below.
(2) A second constraint rejected the possibility that the populated intermediate might be misfolded and off-pathway. The off-pathway model conventionally written as Ix to U to N predicts a reverse denaturant effect in the folding arm of the slow-phase chevron at very low denaturant. Here the observed folding rate should initially increase with added denaturant because the rate for folding to N becomes dominated by an unfolding step, Ix to U. As a result of multipathway kinetic coupling, the same Ix-to-U step also determines the measured unfolding arm of the faster phase. Therefore, the expected reverse denaturant effect in the slow-phase folding arm can be predicted by back extrapolation from the fast-phase unfolding arm. The extrapolation shown as a red dashed line in Figure 4, A and B, crosses rather than merging with the measured data, indicating that the reverse denaturant effect expected for the off-pathway model is absent. This test rules out the conventional model for an off-pathway misfolded intermediate. The test was made more convincing by repeating the kinetic experiments in added Na2SO4, which stabilizes the intermediate and elongates the rollover region (Fig. 4C).
However, this test does not reject the type of off-pathway misfolding that enters the PPOE model, as described below.
(3) The possibility that the multiple kinetic phases might originate because of some pre-existing dichotomy of the U state (U and Ux) was tested in a double jump experiment (unfold, delay time, jump to refolding conditions). The results ruled out the usual U-to-Ux option because of proline isomerism, which occurs more slowly than the 30-msec experimental deadtime.
These rejection criteria led, by elimination of all apparent alternatives, to the requirement for independent parallel pathways. The simplest version, with two independent pathways and one intermediate that goes forward slowly and accumulates, is the Triangular model.
Rejection criteria and the PPOE model
Analysis shows that the T reaction scheme satisfies the same criteria, as follows.
(1) N is formed with no kinetic lag. In the T model, the formation of N through the major blocked route alone (U through Ix to N) would show a lag in N formation on the time scale of the U-to-Ix step. The lag is filled in by the unblocked faster U-to-I-to-N route. As before, kinetic coupling inherent in the reaction scheme ensures that N formation occurs on the unblocked route at the same rate as the U-to-Ix step.
(2) Although the T model contains an off-pathway partially misfolded intermediate, it is not rejected by the off-pathway test applied before. The issue can be understood from Figure 4. In the T reaction scheme, the slow Ix-to-I rate plays the role of the previous Ix-to-U rate in the conventional off-pathway scheme. However, unlike the previous Ix-to-U rate (red line in Fig. 4), Ix to I is not coincident with the measured fast-phase unfolding arm, as shown by the kIxI line in Figure 4, B and C. Back extrapolation of the Ix-to-I line to low denaturant does not cross the slow-phase folding data in the region measured; therefore the off-pathway misfolding is not rejected by the data measured. It would ultimately rate-limit the slow-phase folding rate and produce a reverse denaturant effect but only in an unreachable “negative [GdmCl]” region.
(3) The third rejection test rules out a slowly produced (>30 msec) U-to-Ux form. The T model does not require a Ux form to fit the present lysozyme data, although some population fraction of HEWL does contain pre-existing Ux forms (nonnative Trp cluster, misisomerized prolines) that cause slow folding phases (Eyles et al. 1994; Kiefhaber 1995; Rothwarf and Scheraga 1996; Klein-Seetharaman et al. 2002). The more general PPOE model in Scheme 4 (in Fig. 2) includes this possibility.
A hidden intermediate detected by unfolding
Kiefhaber et al. (1997) used an interrupted folding experiment to populate the slowly folding lysozyme intermediate and measured its unfolding by a jump to high denaturant. The unfolding of the populated intermediate could not be fit with a single exponential. The results define an additional faster kinetic phase (filled circles in Fig. 5).
In principle, the three-species Triangular model can only produce two relaxation phases, and unfolding of the populated intermediate can only exhibit single exponential two-state I-to-U kinetics. To explain the results, another species is required. Kiefhaber et al. (1997) proposed an on-pathway intermediate, called Nu (for nucleating), placed before the populated I in the folding direction. Figure 5, A–C, includes these new data (from Figs. 3 and 4A of Kiefhaber et al. 1997) and globally fits all of the data to the modified Triangular + Nu model (Scheme 2 in Fig. 2).
The unmodified T model naturally predicts the non-two-state unfolding of the populated intermediate and the additional fast kinetic folding phase. Some fraction of Ix unfolds directly (Ix through I to U), whereas another fraction recycles back from I to Ix and reaches U more slowly. When the unfolding data are included, kinetic fitting of the T model retains the previously determined rate constants and fixes the previously underdetermined ones (χR2 = 0.7) (Fig. 5D–F).
The modified Triangular model approaches the T model. Both have three populated species and a kinetically hidden intermediate. However, the Triangular model still requires in addition the straight-through U-to-N pathway in order to satisfy the first rejection test (no lag in N formation). The extended Triangular model then requires eight rate constants to explain the available lysozyme data compared to six for the more minimal T model.
Multipathway model for two populated intermediates
Bieri and coworkers (Bieri et al. 1999; Bieri and Kiefhaber 2001) repeated these experiments at 10°C (Fig. 6) and 20°C (data not shown) in the presence of added NaCl (0.85 M NaCl at pH 5.2, called the high salt condition), and also at higher pH (pH 9.2, 20°C) (Fig. 7). Each of these conditions made it possible to detect an additional folding phase, which requires the population of another intermediate. Several kinetic models were tested, in this case with four populated species (U, N, I1, I2) (Bieri et al. 1999). The rejection criteria appeared to require an additional independent pathway and ruled out several possible models. The inability to fit all of the data ruled out some other models. The successful scheme with three independent pathways is represented as a Diamond model (Scheme 3 in Fig. 2).
As before, the published fitting factors (from Fig. 7 of Bieri et al. 1999 and Fig. 3 of Bieri and Kiefhaber 2001) matched one of the data sets less well (red curves in Figs. 6 and 7), in this case the chevron data. The agreement could be improved by a global fitting (black). As before, only some of the kinetic constants shown for the Diamond model in Figures 6 and 7 are well constrained by the data (shown in boldface).
PPOE model with two populated intermediates
The T model contains four species (U, N, I, Ix) and therefore predicts three kinetic phases, with the third phase coming from the minimally populated on-pathway I, as noted before. However, the simple T model does not produce a good global fit to the results in Figures 6 and 7. As for the multipathway model, another well-populated intermediate is necessary. An extended T model adds the required intermediate in either a Plus or a Double T configuration, shown as Schemes 6 and 7 (Fig. 2), which turn out to be kinetically equivalent.
In the Plus reaction scheme, the on-pathway I appears to generate two different error-dependent Ix forms. This PPOE model survives the rejection tests, and it provides excellent global fits to the data, as shown in Figures 6 and 7. As before, only the rate constants (and their m-values) leading to the on-pathway I are fixed by the data available. Only the ratios of the other four rate constants are fixed. They determine the partitioning of I into the four competing pathways and the time-dependent populations of the individual species.
As before, the fitted Diamond and Plus models both produce the same S-values (relative fluorescence signals) for the intermediates and hence the same total flux away from U, the same flux into their respective blocked intermediates, and the same flux away from N. As before, the Plus model fits all of the data with fewer fitting rate constants than the Diamond model (8 vs. 12) and an equivalent χR2 parameter in each case.
We also tested Scheme 7 (Fig. 2), called the Double T model, with two on-pathway intermediates and their misfolded analogs. This scheme survives the rejection tests and globally fits the chevron and population data sets at all conditions. It formally has two more kinetic fitting constants than the Plus model, but it is kinetically equivalent and reduces to the Plus model because I1 and I2 interconvert rapidly and effectively play the role of a single I.
Early work in protein folding was guided by the conjecture that proteins could find their predestined native state within the vast space of possible folds only by traveling through some predetermined pathway (Levinthal 1968, 1969). Kinetic experiments detected pathway intermediates that were interpreted in these terms (Kim and Baldwin 1982, 1990). In agreement, a range of structural experiments characterized partially folded forms, molten globules, and folded protein fragments believed to mimic specific kinetic folding intermediates (Oas and Kim 1988; Staley and Kim 1990; Wu et al. 1993; Ptitsyn 1995; Chamberlain et al. 1999; Feng et al. 2005; Maity et al. 2005). Structural information from hydrogen exchange and related methods found and characterized partially native-like intermediates and showed that they construct classical folding pathways (Roder et al. 1988; Bai et al. 1995; Chamberlain et al. 1996; Fuentes and Wand 1998a,b; Xu et al. 1998; Milne et al. 1999; Chamberlain and Marqusee 2000; Chu et al. 2002; Hoang et al. 2002; Silverman and Harbury 2002; Yan et al. 2002, 2004; Feng et al. 2003, 2005; Krishna et al. 2003a,b, 2004, 2006; Maity et al. 2004, 2005; Cecconi et al. 2005; Weinkam et al. 2005). The direct NMR structure determination of two engineered variants of apocytochrome b562 (Feng et al. 2005) showed them to be closely equivalent to the on-pathway kinetic intermediates previously defined by native state HX methods under two-state folding conditions (Fuentes and Wand 1998a,b; Chu et al. 2002). These observations provide the experimental basis for a classical predetermined pathway.
However, developments in the theoretical area fostered a different view, namely, that proteins can fold through any number of unrelated paths, directed only by a random energetically downhill search for the native state (Zwanzig et al. 1992; Bryngelson et al. 1995; Wolynes et al. 1995; Brooks 1998; Plotkin and Onuchic 2002a,b). Partly on this basis, the varied folding behavior observed for many proteins by spectroscopic methods has more recently been widely interpreted in terms of independent parallel pathways through unrelated intermediate states (Baldwin 1995; Dobson et al. 1998; Bilsel and Matthews 2000; Wallace and Matthews 2002) (even though experimental results find only a small number of alternative parallel pathways rather than the many implied by theoretical simulations).
The independent pathways and predetermined pathway views project fundamentally different representations of how proteins actually fold. These sharply different interpretations concerning the most fundamental modalities of protein folding pathways can be rationalized by the optional error hypothesis, suggested by the well-known tendency of proteins to misfold. Chance misfolding within a single pathway will mimic multipathway behavior.
Can the predetermined pathway–optional error model provide a quantitative match to real protein behavior? Hen egg lysozyme has been most extensively characterized and analyzed in previous kinetic folding studies. The results have been interpreted in terms of multiple independent pathways. Analysis shows that the single-pathway PPOE model can fit the HEWL data as well as the multiple independent pathway model, and it does so with fewer fitting parameters.
Both models satisfy the rejection criteria developed by the Kiefhaber group. Both quantitatively fit the heterogeneous folding of lysozyme under disparate conditions with different sets of measured relaxation rates, and therefore seem able to fit any other accurate self-consistent data set for the two-state, multistate, or heterogeneous folding of any protein. In the general case, additionally measured kinetic phases could be accommodated by either model, undoubtedly with equally good fit. It is difficult to conceive of any fundamentally different kind of model that can meet these requirements.
Comparison with experimental information
Can one go past a data-fitting exercise and choose between the multipath IUP and the PPOE models on more substantial grounds? The derived parameters endow the two models with distinctively different folding characteristics. Their validity may be judged by comparing them with known folding information.
In the IUP model, population fractions that fold in a two-state manner are explained by a direct U-to-N pathway. Intermediates do not accumulate, apparently because they are either absent or unstable. In the PPOE model, pathway intermediates are obligatory (except for proteins that are smaller than two foldon units). Two-state folding occurs because intermediates inherently fold forward much faster than the rate-limiting initial nucleation step and do not significantly populate.
Experiment favors the PPOE view. Intermediates that fold forward rapidly and are kinetically silent, even though they are on-pathway and stable (relative to U), have been demonstrated for several proteins by native state HX (Bai et al. 1995; Chamberlain et al. 1996; Fuentes and Wand 1998a,b; Xu et al. 1998; Milne et al. 1999; Chamberlain and Marqusee 2000; Chu et al. 2002; Hoang et al. 2002; Yan et al. 2002, 2004; Feng et al. 2003, 2005; Krishna et al. 2003b, 2006; Maity et al. 2004, 2005), and by a related experiment using sulfhydril reactivity (Silverman and Harbury 2002), and also by NMR relaxation dispersion (Korzhnev et al. 2004). Kinetic considerations support sequential on-pathway intermediates for many other proteins (Sánchez and Kiefhaber 2003a,b).
The possibility that intermediates are absent in the two-state folding of sizeable proteins like lysozyme (129 residues) seems unlikely in view of the fact that the foldon size in the best worked out case of Cyt c is 15–20 residues. Even the small fyn SH3 protein with only 59 residues is seen to exhibit a stable but kinetically silent on-pathway intermediate (Korzhnev et al. 2004). Two-state folding kinetics is seen especially for small proteins, where one expects the probability for folding errors to be minimal. As protein size increases, the probability of error formation increases and one sees an increasing non-two-state fraction, as expected.
According to the PPOE parameters, uncorrupted on-pathway intermediates are inherently fast. This makes the first pathway step inherently rate-limiting and folding inherently two-state, unlike the multipathway model. This property is consistent with the success of the contact order formalism (Plaxco et al. 1998, 2000) and with nucleation models in general (Wetlaufer 1973; Fersht 1995; Sosnick et al. 1996). The contact order formulation implies that an initial whole molecule conformational search for a collapsed chain conformation with sufficient native-like topology is the inherently slow step (Sosnick et al. 1996; Plaxco et al. 1998). The PPOE model provides a physical rationale. The small-scale templated conformational search necessary to build onto pre-existing native-like structure should be naturally faster than the initial whole-molecule untemplated search.
According to the PPOE parameters, non-two-state folding occurs because of off-pathway errors. Fast two-state folding seems biologically favorable insofar as it avoids the lengthy occupation of partially folded states that would promote in vivo proteolysis and aggregation. This is desirable both for initial folding and subsequently since native proteins repeatedly unfold and refold even under native conditions (Bai et al. 1995). Intermediates that are inherently slow as in the multipathway model would exacerbate the problem. The problem is relieved by an initial rate-limiting barrier that requires the polypeptide to find a near-native topology, which maximizes the probability that subsequent steps go forward rapidly without blocking. One evolutionary strategy is suggested by the prevalence of the N-terminal to C-terminal docking motif. Half of all single-domain proteins in the Protein Data Bank and close to 100% of all known kinetically two-state proteins have this motif, and it tends to form as an initial step, which seems favorable for minimizing subsequent misfolding errors (Krishna and Englander 2005).
In the fitted IUP model, three-state folding occurs because stable on-pathway intermediates naturally fold forward more slowly than the initial untemplated whole-molecule search. In contradiction, stable on-pathway intermediates that fold forward rapidly and are kinetically silent have been demonstrated for several proteins, as just listed.
In the PPOE model, the native protein is built in foldon-sized steps. Three-state folding occurs when a native-like intermediate additionally contains some slowly repaired misfolding error. In agreement, folding intermediates for many proteins are seen to be partial replicas of the native protein, whether they are kinetically populated or not, and the kinetically blocked forms are also seen to contain some significant misfolding (Kiefhaber et al. 1992; Dobson et al. 1994; Elöve et al. 1994; Muñoz et al. 1994; Sosnick et al. 1994, 1996; Weissman and Kim 1995; Silow and Oliveberg 1997; Bai 1999; Bilsel et al. 1999; Bhuyan and Udgaonkar 2001; Capaldi et al. 2002; Wallace and Matthews 2002; Krishna et al. 2003a, 2004; Bollen et al. 2004; Rojsajjakul et al. 2004; Religa et al. 2005; Wintrode et al. 2005; Nishimura et al. 2006).
It can be noted that partially folded intermediates are likely to minimize energy by exploiting adventitious nonnative interactions (Feng et al. 2003). Their reversal may or may not be rate-limiting. One needs to distinguish naturally occurring but innocuous nonnative interactions from slowly repaired blocking errors (Feng et al. 2005). Within the terms of the PPOE model, misfolding errors interfere with the sequential stabilization of a succeeding foldon unit.
In the IUP model, folding is heterogeneous because different molecules naturally fold in different tracks through different transition states and intermediates in different parts of the folding landscape. The PPOE model, based on cooperative native-like units and their interactions, pictures that all of the molecules naturally fold by way of the same barriers and native-like intermediates. The probabilistic introduction of misfolding errors can produce a heterogeneous mix of two-state and three-state population fractions.
Experimental observations favor the predetermined pathway view. In Cyt c, an early intermediate with the same native-like structure and stability was characterized under two-state conditions (native state HX) (Bai et al. 1995; Milne et al. 1999) and in three-state folding (HX pulse labeling) (Roder et al. 1988; Krishna et al. 2003a). The entire protein population folds through the same intermediate (Sosnick et al. 1996; Krishna et al. 2003a) whether it is blocked by an added misfolding error or not. An analogous demonstration was obtained for ribonuclease H (Raschke and Marqusee 1997; Chamberlain and Marqusee 2000). In many proteins, it appears that much or all of the population folds to and blocks at the same intermediate step (85% in lysozyme). The multiple apparent tracks seen in the folding of α-Trp synthase appear to result from the chance insertion of different barriers due to proline misisomerization (Wu and Matthews 2002, 2003).
Similarly, the PPOE model implies that protein molecules in a refolding population should tend to use the same initial nucleation strategy, encoded in the same initial U-to-I1 step, whether folding is two-state or three-state or heterogeneous. In many two-state folding proteins, φ and ψ analysis show that all of the molecules fold through the same initial rate-limiting transition state. The initial barrier has been measured to be identical for two-state and three-state folding in the case of Cyt c (same rate, temperature dependence, and denaturant dependence) (Sosnick et al. 1996). The analysis of Kuwajima and coworkers (Kamagata et al. 2004) suggests that the first barrier in three-state folding proteins obeys a contact order relationship similar to that for the initial barrier in two-state folding proteins.
In the PPOE model, the three-state discrimination occurs after the initial step when some molecules happen to encounter an error-repair barrier. In Cyt c, the insertion of a misligation barrier can be turned on or off simply by adjusting the pH in the unfolded condition (Elöve et al. 1994; Sosnick et al. 1994, 1996). For the proline-dependent misfolding of many proteins, barrier insertion can be turned on or off simply by adjusting the time spent in the unfolded condition (Nall 1994). Earlier folding kinetics are unaffected.
Known folding behavior for many proteins parallels the PPOE model and not the multipathway model.
Comparison with theoretical models
A comparison of the PPOE model with theoretical energy landscape models suggests some similarities and some differences. The major concern for present purposes is the apparent conflict between the independent unrelated pathways view and the foldon-determined view. The conflict is more apparent than real.
Energy landscape theory generally pictures that proteins can fold through many unrelated paths (Bryngelson et al. 1995; Wolynes et al. 1995; Dill and Chan 1997; Brooks 1998; Plotkin and Onuchic 2002a,b). Trajectories are often pictured to disperse at the initiation of folding because of the different positions in unfolded space occupied by the different molecules. Copious branching occurs in subsequent pathway steps because the protein grows by adding one amino acid at a time in no predictable order. Different molecules move downhill through different regions of the funnel-shaped landscape on which every point corresponds to a different three-dimensional structure.
Analogous to funnel landscapes, the logic of the sequential stabilization mechanism allows the folding pathway to branch when a given intermediate is able to template the addition of more than one subsequent foldon. However, the degree of branching is severely limited by the fact that alternative pathway steps must use the same few foldon units and foldon–foldon interactions. For example, Cyt c is experimentally observed to fold in five discrete foldon steps (Bai et al. 1995; Xu et al. 1998; Milne et al. 1999; Hoang et al. 2002; Krishna et al. 2003a,b, 2004, 2006; Maity et al. 2004, 2005). It follows a linear pathway for its first three on-pathway steps, then branches and regroups to N in the final two steps, all as dictated by its native structure and the sequential stabilization principle (Krishna et al. 2006). Branching would be vastly more complex if folding proceeded in 104 unordered steps, the number of its amino acids. Other proteins are also seen to fold one foldon unit at a time (Chamberlain et al. 1996; Fuentes and Wand 1998a,b; Chamberlain and Marqusee 2000; Chu et al. 2002; Silverman and Harbury 2002; Yan et al. 2002, 2004; Feng et al. 2003, 2005).
In the energy landscape view, intermediates become visible in kinetic folding because some of the molecules chance to fall into one or more of the many misfolding traps that dot the folding landscape. This view agrees with the present hypothesis that errors (traps) are optional and error repair can be rate-limiting.
However, a critical experimental observation is that all of the three-state HEWL fraction, 85% of the total population, occupies the same intermediate state. All of the molecules in the populated intermediate have essentially the same native-like structure, the same misfolding errors, and the same rates for formation and decay (Radford et al. 1992; Miranker et al. 1993; Dobson et al. 1994). Structural information shows a similar result for the populated intermediates of other proteins even when they are early, high in the not-yet-converged funnel (see, e.g., Krishna et al. 2003a, 2004).
These experimental observations are hard to understand if different population fractions descend the landscape through several unrelated paths. It is, however, just what is required by the PPOE model in the limit of complete pathway linearity.
Amino acids versus cooperative foldons
The major difference in the behavior expected from the foldon and funnel views is due to the different levels of structural behavior that are considered. Theoretical simulations generally focus on folding behavior at the microscopic amino acid level. The assembly of a given foldon, for example, a single helix, does occur at the amino acid level with many nucleation and growth alternatives, which is well represented by the multitrack funnel model. However, experiment is able to detect protein assembly only at a more macroscopic, cooperative foldon level where folding pathways are assembled by foldon units of the native protein and their stepwise native-like association. The micro and macro views are each pertinent to their own domain. They appear to conflict when either is inappropriately applied to the domain of the other.
It seems intuitively likely that theoretical simulations that understand proteins as accretions of a small number of cooperative foldon units rather than many independent amino acids would calculate pathways that proceed in macroscopic steps through a sequence of predetermined native-like intermediates in the growing native format. The role of foldons may also bear on related issues that are usually visualized in terms of amino acid level interactions (Go models, landscape smoothness, contact order, φ and ψ analysis) but might be better understood in the context of cooperative native-like foldon structure and interactions.
Discussions of possible folding models have been handicapped by some common misconceptions. (1) Intermediates if stable must accumulate. (2) If intermediates don't affect measured folding kinetics, they are irrelevant. (3) Populated intermediates cause slow folding. The PPOE perspective helps to illuminate these issues.
(1) It is often assumed that on-pathway intermediates must accumulate if they are stable relative to U, and are not present if no accumulation is seen. These views are built into multipathway models, as for lysozyme considered above. In fact, a kinetic intermediate will visibly populate only if it is more stable than all prior states and it is blocked by a barrier that is higher than all prior barriers (trough to peak). Experimentally, equilibrium native state HX has been able to detect folding intermediates under two-state folding conditions. They are stable, on-pathway, obligatory, and kinetically invisible. In the PPOE view, this is because on-pathway intermediates naturally fold forward more rapidly than the initial pathway step.
(2) The fact that on-pathway intermediates do not affect the measured folding rate does not make them unimportant for the folding process. In the PPOE model, normally occurring obligatory intermediates form after the initial inherently rate-limiting step, but they are kinetically silent only when all goes well. A corrupting error can cause an intermediate to accumulate and folding to be slowed. Slowing would not occur if the pathway did not obligately move through that particular intermediate. The initial step is rate-limiting and intermediates are invisible because the subsequent pathway goes forward efficiently. As for any kinetic pathway, the folding pathway is constructed by the obligatory intermediates that define it and would not progress efficiently without them, whether they occur invisibly after a rate-limiting nucleation step or not.
(3) The correlation between intermediate accumulation and slow folding has been widely noted, and a cause-and-effect relationship has often been drawn. Rather, in the PPOE analysis, intermediate accumulation and slowed folding occur together because both are caused by an inserted error-repair barrier. Uncorrupted intermediates promote rather than hinder folding, whether they are stable or not.
In summary, the absence of accumulation does not mean that intermediates are absent, or unstable, or unimportant, and the accumulation of off-pathway forms that are obstructive does not mean that uncorrupted on-pathway intermediates are obstructive. When these common misconceptions are put aside, available information can be seen to be wholly consistent with the PPOE model.
Spectroscopy-based and theory-based studies of kinetic protein folding are commonly interpreted in terms of multiple independent unrelated pathways, the IUP model. By contrast, detailed structural information depicts pathways that put predetermined units of the native protein into place in a sequential stabilization process, as in the PPOE model. Both models can quantitatively fit extensive data sets for the heterogeneous folding and unfolding of lysozyme at varied conditions and therefore seem able to fit the analogous behavior of other proteins in general. The specific rejection criteria developed by the Kiefhaber group eliminate any other minimal reaction schemes that are significantly different.
Fitting to the lysozyme data imposes different properties on the two different models, which makes possible a clear test against known information. The properties of the fitted IUP model are contrary to experimental results for many proteins. The properties of the fitted PPOE model are all seen experimentally. Theoretical simulations that exhibit many independent pathways seem to be pertinent to microscopic amino-acid-level behavior, at the level of individual foldon construction, rather than to macroscopic pathway behavior.
The PPOE model is based on clear experimental results that can be seen to derive from three physical principles, as follows.
(1) The units of protein folding are cooperative native-like structural elements (foldons) (Zimm and Bragg 1959; Lifson and Roig 1961; Krishna et al. 2003b).
(2) Prior structure templates the formation of complementary structure (sequential stabilization) (Watson and Crick 1953; Martin et al. 1996; DelMar et al. 2005; Uversky et al. 2005).
(3) Probabilistic misfolding errors, which are prevalent (Chi et al. 2003; Krishna et al. 2004; Yewdell 2005), can cause population fractions to block and the corrupted intermediates to accumulate (optional errors).
The first two principles dictate that proteins fold in macroscopic structural steps with the order of steps set by the way that foldon units are organized in the native protein. The third determines whether pathways appear to be kinetically two-state, three-state, or heterogeneous. The integration of these principles in the PPOE model unifies within a coherent mechanism a broad range of experimental observations concerning the properties of protein folding pathways and identifies their structural bases.
Materials and methods
The HEWL data analyzed here (Kiefhaber et al. 1997; Wildegger and Kiefhaber 1997; Bieri et al. 1999; Bieri and Kiefhaber 2001) were fit to the various kinetic schemes shown in Figure 2. Our analysis assumes, as in the prior analyses of Kiefhaber and coworkers, that unfolding amplitudes measured in the interrupted refolding experiments (fluorescence) represent concentrations of the individual species when corrected for signal strength.
For a scheme involving n species, there will be (n − 1) nonzero macroscopic relaxation phases that are functions of the microscopic rate constants kij. Each kij, connecting species i and j, was assumed to depend on denaturant concentration D according to the usual relationship:
where kij,0.6 are the reference values at 0.6 M GdmCl, the denaturant concentration where the interrupted folding experiments were done, and mij is the slope of the dependence on denaturant (D) of the term RT ln kij. The final fit values are in Figures 3, 5, 6, and 7.
For fitting the data to a given kinetic scheme, grid search (Bevington and Robinson 1994) and eigen methods were used. Eigenvalues of the rate matrix are the observed macroscopic relaxation rates. The right eigenvectors define the ratios of the corresponding amplitudes in the population kinetics. The exact amplitudes were calculated to match the starting condition (100% U) in kinetic experiments. The procedure used is as follows. Starting from an initial set of trial microscopic rate constants and m-values, the rate matrix was set up and solved for its eigen solutions using canned routines in the linear algebra package CLAPACK (Anderson et al. 1999; http://www.netlib.org/clapack/). The eigenvalues (macroscopic rates) as a function of denaturant concentration were used to fit the chevron data. The eigenvectors with pre-factors determined from the starting condition (100% U) were used to fit the population kinetics.
The ordinate values used in Figures 3, 6, and 7 (middle panels) represent concentrations calculated by the Kiefhaber group, based on their fit after normalizing fitted fluorescence to N = 1. The data in Figure 5 show the observed unfolding amplitudes. In order to globally refit the data, we had to recompute proportionality constants for the fluorescence signals of intermediate species (denoted as S in the Figures). S-values relative to the native signal normalized to 1 were determined using the standard linear regression relationship for calculated C(t) vs. measured At as in Equation 2:
At is the measured fluorescence signal (unfolding amplitude) in interrupted folding experiments; C(t) is the fractional molecular concentration calculated from a given set of microscopic rate constants at the folding time t. S is not an independent parameter as evident from Equation 2. It is determined by the micro-rate constants to match the experimental data. S equal to 1 in Figures 3, 6, and 7 would represent exact agreement with the concentrations calculated by Kiefhaber et al.
Goodness of the chevron and kinetic fit was judged using the global reduced χ2 parameter defined as follows (Bevington and Robinson 1994):
NCHEV and NKIN represent the number of chevron and kinetics data points; nVAR is the number of free parameters in a given scheme (for example in the T model, nVAR = 6 rate constants + 6 m-values = 12); λD and λ(D) are the experimentally measured and the calculated macroscopic relaxation rates at the denaturant concentration D; At and A(t), equal to S × C(t), are the measured and calculated fluorescence amplitudes at the folding time t in interrupted folding experiments. Standard deviations σ for ln λD and At were arbitrarily taken to be 2% of the full scale, namely, 0.2 and 0.02, respectively. The fitting procedure was repeated iteratively by varying each of the initial parameters in steps of ±10% to ±0.01% until the best fit (least χR2) was obtained. Machine-generated random numbers were used to set the order in which the initial parameters were selected for step increments.
All computer programs were home written in ANSI C on the Microsoft Visual C++ 6 platform. The correctness of the programs was checked by comparing predicted kinetic curves with those obtained using the KINSIM (Barshop et al. 1983; http://www.biochem.wustl.edu/cflab/message.html) and DYNAFIT (Kuzmič 1996; http://www.biokin.com/dynafit) programs. The parameters constrained by the data, shown in boldface in Figures 3, 5, 6, and 7, were determined by starting from different sets of initial values in multiple runs and noting parameters that did and did not change.
We thank Thomas Kiefhaber for kindly providing the lysozyme folding data. For a critical reading of the manuscript and helpful comments, we thank Robert Baldwin, Yawen Bai, Ken Dill, Neville Kallenbach, Thomas Kiefhaber, Leland Mayne, Leslie Milk, Hillary Nelson, George Rose, John Skinner, Tobin Sosnick, and our laboratory members. This work was supported by NIH research grant GM 031847.