Institute for Biological Instrumentation, Russian Academy of Sciences, 142292 Pushchino, Moscow Region, Russia; Department of Chemistry and Biochemistry, University of California, Santa Cruz, California 95064, USA
Department of Chemistry and Biochemistry, University of California, Santa Cruz, CA 95064, USA; fax: (831) 459-2935.
The experimental material accumulated in the literature on the conformational behavior of intrinsically unstructured (natively unfolded) proteins was analyzed. Results of this analysis showed that these proteins do not possess uniform structural properties, as expected for members of a single thermodynamic entity. Rather, these proteins may be divided into two structurally different groups: intrinsic coils, and premolten globules. Proteins from the first group have hydrodynamic dimensions typical of random coils in poor solvent and do not possess any (or almost any) ordered secondary structure. Proteins from the second group are essentially more compact, exhibiting some amount of residual secondary structure, although they are still less dense than native or molten globule proteins. An important feature of the intrinsically unstructured proteins is that they undergo disorder–order transition during or prior to their biological function. In this respect, the Protein Quartet model, with function arising from four specific conformations (ordered forms, molten globules, premolten globules, and random coils) and transitions between any two of the states, is discussed.
This review introduces an intriguing protein family of natively unfolded proteins, whose existence questions one of the cornerstones in protein biology, chemistry and physics, that is, the structure–function paradigm. This concept claims that a specific function of a protein is determined by its unique and rigid three-dimensional (3D) structure. This idea, formulated more than 100 years ago as a lock-and-key model for explaining the amazing specificity of the enzymatic hydrolysis of glucosides (Fischer 1894), proved to be extremely fruitful. Figuratively speaking, the protein structure–function paradigm may be considered as the big bang, creating the universe of modern protein science. Figure 1 attempts to illustrate the most obvious scientific consequences of this concept.
However, a reappraisal of the protein structure–function paradigm is now warranted based on systematic studies of intrinsically unfolded/disordered proteins (Wright and Dyson 1999; Dunker et al. 2001; Uversky 2002). There are two major reasons for such a reappraisal: the results of the amino acid sequence analyses and the accumulation of experimental evidence for the existence of a rather large amount of protein domains and even entire proteins, lacking ordered structure under physiological conditions. Intriguingly, both of these sets of evidences were gathered using scientific concepts and approaches originating from the protein structure–function paradigm, namely, proteomics and protein self-organization (see Fig. 1).
Proteomics versus protein structure–function paradigm
The application of neuronal network predictors for protein disorder using primary sequence information to the Swiss Protein Database has predicted that more than 15,000 proteins may contain disordered regions of at least 40 consecutive amino acid residues, with more than 1050 of them having high scores indicating disorder (Dunker et al. 1998; Romero et al. 1998b). This observation helped to conclude that “a large portion of gene sequences appear to code not for folded, globular proteins, but for long stretches of amino acids that are likely to be either unfolded in solution or adopt non-globular structures of unknown conformation…. The high proportion of gene sequences in the genomes of all organisms argues for important, as yet unknown functions, since there could be no other reason for their persistence throughout evolution” (Wright and Dyson 1999). Intriguingly, recent predictions on 29 genomes have established that proteins from eucaryotes have more intrinsic disorder than those from bacteria and archaea, with more than 30% of eucaryotic proteins having disordered regions greater than 50 consecutive residues (Dunker et al. 2001).
Protein self-organization and protein structure–function paradigm
There is a rapidly growing set of proteins, which have been shown to be disordered or have profound disordered regions under physiological conditions. Several experimental approaches sensitive to the intrinsic disorder of a given protein or its part have been used to provide this evidence based on studies of protein self-organization problems.
It is known that the unique 3D structure of a globular protein is stabilized by noncovalent interactions (conformational forces) of different natures, such as hydrogen bonds, hydrophobic forces, van der Vaals interactions, etc. It was established long ago that high concentrations of strong denaturants (such as urea or guanidinium chloride [GdmCl]) lead to the complete disruption of all these interactions and, as a consequence, to the transformation of an initially folded protein molecule into a highly disordered random coil (Anson and Mirsky 1932; Mirsky and Pauling 1936; Neurath et al. 1944; Tanford 1968). In other words, such conditions cause the complete unfolding of proteins. However, sometimes changes in the environment can reduce (or even completely shut down) part of the conformational interactions, while the rest remain unchanged (or be even intensified). In these cases proteins will usually lose their biological activity, that is, they will be denatured. Denaturation is not necessarily accompanied by the complete unfolding of a protein, but rather results in the appearance of new conformations with properties intermediate between those of the native and the completely unfolded states.
It is known that globular proteins may exist in at least four different conformations: native (ordered), molten globule, premolten globule, and unfolded (Uversky and Ptitsyn 1994, 1996a; Ptitsyn 1995; Uversky 1997, 1998). The structural properties of the molten globule are well known, and have been systematized in a number of reviews (e.g., Ptitsyn 1995). It has been established that the protein molecule in this intermediate state has no (or has only a trace of) rigid cooperatively melted tertiary structure, that is, it is denatured. Small-angle X-ray scattering showed that the protein molecule in this intermediate state has a globular structure typical of native globular proteins (Eleiser et al. 1993; Kataoka et al. 1993, 1997; Semisotnov et al. 1996; Uversky et al. 1998). 2D NMR coupled with hydrogen-deuterium exchange showed that the protein molecule in the molten globule state is characterized not only by the native-like secondary structure content, but also by the native-like folding pattern (Baum et al. 1989; Bushnel et al. 1990; Jeng et al. 1990; Chyan et al. 1993; Wu et al. 1993; Eliezer et al. 1998; Bose et al. 1999; Bracken 2001). A considerable increase in the accessibility of a protein molecule to proteases was noted as a specific property of the molten globule state (Merrill et al. 1990; Fontana et al. 1993). It was also shown that transformation into this intermediate state is accompanied by a considerable increase in the affinity of a protein molecule to the hydrophobic fluorescence probes (such as 8-anilinonaphthalene-1-sulfonate, ANS) and this behavior should be considered as a characteristic property of the molten globule state (Semisotnov et al. 1991; Uversky et al. 1996). Finally, it was established that the averaged value for the increase in the hydrodynamic radius in the molten globule state compared with the native state is no more than 15%, which corresponds to volume increase of ∼50%.
The structural peculiarities of a polypeptide chain in the premolten globule state are summarized below. The protein molecule in this state is denatured, that is, it has no rigid tertiary structure. It is characterized by a considerable secondary structure, although much less pronounced than that of the native or the molten globule protein (protein in the premolten globule state has ∼50% native secondary structure, whereas in the molten globule state the corresponding value is close to 100%). The protein molecule in the premolten globule state is considerably less compact than in the molten globule or native states, but it is still more compact than the random coil (its hydrodynamic volume in the molten globule, the premolten globule, and the unfolded states, in comparison to that of the native state, increases 1.5, ∼3, and ∼12 times, respectively). The protein molecule in the premolten globule state can effectively interact with the hydrophobic fluorescent probe ANS, although essentially weaker than in the molten globule state. This means that at least part of the hydrophobic clusters of polypeptide chain accessible to the solvent is already formed in the premolten globule state (Uversky and Ptitsyn 1994, 1996a; Ptitsyn 1995; Uversky 1997, 1998). It has also been established that in the premolten globule state the protein molecule has no globular structure (Uversky 1997, 1998; Uversky et al. 1998). The last observation indicates that the premolten globule probably represents a “squeezed” and partially ordered form of a coil. Finally, it has been shown that the premolten globule is separated from the molten globule state by an all-or-none transition, which represents an intramolecular analog of the first-order phase transition (Uversky and Ptitsyn 1994, 1996a; Ptitsyn 1995; Uversky 1997, 1998). This means that the molten globule and premolten globule represent divers thermodynamic (phase) states.
As native, molten globule, premolten globule, and unfolded conformations possess defined structural differences along with increasing amounts of disorder, they may be easily discriminated from one another by several physico-chemical methods (Uversky 1999). These techniques are briefly considered below.
X-ray crystallography defines missing electron density in many protein structures, which may correspond to disordered region(s). The increased flexibility of atoms in such a region leads to the noncoherent X-ray scattering, making them unobserved (Bloomer et al. 1978; Bode et al. 1978; Huber 1979, 1987; Schulz 1979; Alber et al. 1982; Spolar and Record 1994; Lewis et al. 1996; Muchmore et al. 1996; Dunker et al. 1997, 2001; Worbs et al. 2000).
Heteronuclear multidimensional NMR is an extremely powerful technique for protein 3D structure determination in solution and for the characterization of protein dynamics. Recent advances in this technology have allowed the complete assignment of resonances for several unfolded and partially folded proteins, as well as the disordered fragments of folded proteins (Alexandrescu et al. 1994; Logan et al. 1994; Zhang et al. 1994; Cho et al. 1996; Kriwacki et al. 1996; Lisse et al. 1996; Donne et al. 1997; Gillespie and Shortle 1997a, 1997b; Penkett et al. 1997; Daughdrill et al. 1998; Eliezer et al. 1998; Fletcher et al. 1998; Liu et al. 1998; Mogridge et al. 1998; Zhang and Matthews 1998; Bose et al. 1999; Fiebig et al. 1999; Hazzard et al. 1999; Hershey et al. 1999; Love 1999; Bracken 2001; see also Wright and Dyson 1999, and references cited therein).
There are two types of optically active chromophores in proteins: side groups of aromatic amino acid residues, and peptide bonds (Adler et al. 1973; Fasman 1996). CD spectra in the near ultraviolet region (250–350 nm), also called the aromatic region, reflect the symmetry of the environment of aromatic amino acid residues and, consequently, are characteristic of protein tertiary structure. Protein denaturation may be easily detected by the simplification of near-UV CD spectrum.
Diminishing of ordered secondary structure may be detected by several spectroscopic techniques including far-UV CD (Adler et al. 1973; Provencher and Glöckner 1981; Johnson 1988; Woody 1995; Fasman 1996; Kelly and Price 1997; Uversky et al. 2000a), ORD, FTIR (see Uversky et al. 2000a, and reference cited therein) and Raman optical activity (Smyth et al. 2001).
Hydrodynamic parameters obtained from techniques such as gel-filtration, viscometry, SAXS, SANS, sedimentation, and dynamic and static light scattering may help in determining whether a protein is compact, or it has became unfolded. The unfolding of a protein molecule results in an essential increase in its hydrodynamic volume. For instance, there is a well-documented 15–20% increase in the hydrodynamic radius of globular proteins upon their transformation into the molten globule state (Uversky 1993, 1994; Ptitsyn 1995), while the hydrodynamic volume of the premolten globule is even larger (Uversky and Ptitsyn 1994, 1996a; Ptitsyn 1995; Uversky 1997, 1998). Moreover, it has been shown that native and unfolded conformations of globular proteins possess very different molecular mass dependencies of their hydrodynamic radii, RS (Tanford 1961, 1968; Uversky 1993). As a result, intrinsically disordered proteins will have an increased hydrodynamic volume relative to native proteins, leading to an increase in their apparent molecular mass (summarized in Uversky et al. 2000a).
Another very important structural parameter is the degree of globularity, which reflects the presence or absence of a tightly packed core in a protein molecule. This information may be extracted from the analysis of SAXS data in the form of a Kratky plot, whose shape is sensitive to the conformational state of the scattering protein molecules (Glatter and Kratky 1982; Feigin and Svergun 1987; Semisotnov et al. 1996; Uversky et al. 1998). It has been shown that a scattering curve in the Kratky coordinates has a characteristic maximum for globular proteins in either their native or molten globule states (i.e., states with globular structure). However, if a protein is completely unfolded or in a premolten globule conformation (i.e., with no globular structure), such a maximum will be absent (Glatter and Kratky 1982; Feigin and Svergun 1987; Semisotnov et al. 1996; Uversky 1997, 1998; Uversky et al. 1998; Tcherkasskaya and Uversky 2001).
Additional knowledge on the intramolecular mobility and compactness of a protein may be extracted from the analysis of different fluorescence characteristics. This includes FRET, shape and position of the intrinsic fluorescence spectrum, fluorescence anisotropy and lifetime, accessibility of the chromophore groups to external quenchers, and steady-state and time-resolved parameters of the fluorescent dyes. Overall, these techniques add important information to the conformational description of a polypeptide.
Increased proteolytic degradation in vitro of intrinsically disordered proteins indirectly confirmed by their increased flexibility (Markus 1965; Mikhalyi 1978; Fontana et al. 1993; Hubbard et al. 1994, 1998; Kriwacki et al. 1996; Lisse et al. 1996; Horiuchi et al. 1997; Hubbard 1998; Hershey et al. 1999; Ratnaswamy et al. 1999; Bouivier and Stafford 2000; Iakoucheva et al. 2001; see Dunker et al. 2001, for recent review).
Immunochemical methods may also be applied toward the elucidation of protein disorder. Important to this discussion, the immunoglobulins obtained against a given protein may be specific for different levels of macromolecule: the primary structure (Amit et al. 1985; Wilson et al. 1985), the secondary structure (Fujio et al. 1985), or the tertiary structure (Amit et al. 1985; Fujio et al. 1985; Wilson et al. 1985). In the latter case, the antigenic determinants may reside on either the neighboring residues in the chain (loops) (Amit et al. 1985; Wilson et al. 1985) or on spatially distant residues (Fujio et al. 1985). Furthermore, it has been shown that antibodies in the immune serum may possess a high affinity to the internal elements of an antigen (Fujio et al. 1985). Thus, antibodies may be successfully used to study the structural changes, which a protein-immunogen undergoes upon changes of the experimental conditions. For example, antibodies obtained against the Ca2+-saturated F1-fragment of prothrombin did not interact with the calcium-free apo-form of this protein (Furie and Furie 1979). An analogous effect was also observed in the case of osteocalcine (Delmas et al. 1984).
Finally, intrinsic disorder may be detected by the analysis of protein conformational stability. For example, the presence or absence of a cooperative transition on the calorimetric melting curve for a given protein is a simple and convenient criterion indicating the presence or absence of a rigid tertiary structure (Privalov 1979; Ptitsyn 1995; Uversky 1999). Furthermore, it has been shown that the steepness of urea- or guanidinium chloride-induced unfolding curves depends strongly on whether a given protein has a rigid tertiary structure (i.e., it is native) or is already denatured and exists as a molten globule (Ptitsyn and Uversky 1994; Uversky and Ptitsyn 1996b). To extend this type of analysis, the values of Δνeff (which is the difference in the number of denaturant molecules “bound” to one protein molecule in its two states) should be determined. Then this quantity should be compared to the ΔνeffN→U and ΔνeffMG→U values corresponding to the native to coil and molten globule to coil transitions in globular protein of a given molecular mass, respectively (Uversky and Ptitsyn 1996b).
Application of several techniques mentioned above to a given protein provides the most unambiguous evidence for the presence of partially folded intermediates.
It has been shown that a considerable number of proteins possess some amount of disorder rather than rigid structure. A special term, “natively denatured,” was introduced in 1994 (Schweers et al. 1994) to emphasize the existence of a drastic structural difference between “normal” globular protein, with rigid tertiary structure, and an “abnormal” extremely flexible tau protein. Two years later, a new term “natively unfolded” originated as a result of conformational analysis of α-synuclein, which under physiological conditions appeared to lack any secondary structure (Weinreb et al. 1996). Two alternative terms, “intrinsically unstructured” (Wright and Dyson 1999) and “intrinsically disordered” (Dunker et al. 2001), have also been suggested to describe these proteins. Because “abnormal” proteins show an extremely wide diversity in their structural properties, the meaning of the above terms should be clarified. Thus, the terms denatured and disordered may be considered as synonyms, and indicate any set of nonrigid conformations of polypeptide chains including different compact partially folded conformations: molten globules and premolten globules, and random coil. The terms unstructured and unfolded may be considered synonymous, and should only be applied to the subset of disordered proteins characterized by the absence of any (or almost any) ordered structure. For the remaining of this review, only natively unfolded proteins will be considered, excluding “native molten globules”.
Are natively unfolded proteins common?
The number of proteins and protein domains that have been shown in vitro to have little or no ordered structure under physiological conditions is rapidly expanding. For example, over the past 10 years there has been a significant increase in publications describing structural properties of intrinsically unstructured (natively unfolded) proteins, starting from two papers in 1991 and ending with more than 30 in 2000. During the same time other interesting aspects of natively unfolded proteins have also been investigated. For example, 2382, 1960, and 370 papers were published concerning different aspects of amyloid-beta peptide, tau protein, and α-synuclein, respectively.
The current list of different natively unfolded proteins includes more than 100 entries, with information on 91 of them presented in our recent work (Uversky et al. 2000a) and Table 1. Only full-length proteins or domains with chain length greater than 50 amino acid residues have been considered. This list would probably be doubled if shorter polypeptides 30 to 50 residues long were included. Finally, the set of 100 proteins described in the literature as “natively unfolded” have at least 250 homologs, which are also expected to be natively unfolded. Additionally, a large number of proteins and protein domains have been predicted to be disordered based on the results of the analysis of amino acid sequences using the neuronal network predictors (Dunker et al. 1998, 2001; Romero et al. 1998b). All this shows that polypeptides without ordered structure under physiological are common, rather than exceptions.
How is unfoldedness encoded in a protein amino acid sequence?
The existence of at least three different disordered equilibrium conformations, molten globule (MG), premolten globule (PMG), and unfolded (random coil-like, U), has been established for typical globular proteins (Uversky and Ptitsyn 1994, 1996a; Ptitsyn 1995; Uversky 1997, 1998). Apparently, the ability of a protein to adopt different stable conformations is an intrinsic property of a polypeptide chain. Although the correct folding of a protein into its rigid biologically active conformation is thought to be determined by its amino acid sequence (Anfinsen et al. 1961), the absence of rigid structure in natively unfolded proteins may be reflected in specific features of their amino acid sequences.
In an attempt to understand the relationship between sequence and disorder, Dunker and coauthors have developed several neuronal network predictors (Romero et al. 1997, 1998a, 1998b, 2001a; Dunker et al. 1998, 2001; Garner et al. 1998; Li et al. 1999, 2000). They assumed that if a protein structure has evolved to have a functional disordered state then a propensity for disorder might be predictable from its amino acid sequence and composition. The results of such analysis were impressive. It was established that disordered regions shared at least some common sequence features between many proteins, and that more than 15,000 proteins in the Swiss Protein database were identified as having long regions of sequence that shared these features (Romero et al. 1998b). Interestingly, the Top 20 proteins (i.e., proteins with the highest scores) were shown to have low sequence complexity, as defined by Wootton (1993, 1994; Wootton and Federhen 1996). In other words, sequences of natively unfolded proteins may be essentially degenerate. Figure 2A illustrates this idea, comparing the s-antigen from Plasmodium (which was shown to be at the head of the Top 20) with that of human serum albumin (a rigid globular protein of similar molecular mass) in terms of their amino acid composition scaled according to McCaldon and Argos (1988). Interestingly, it was later established that the distributions of the complexity values for ordered and disordered sequences overlapped (Romero et al. 2001b), suggesting that low sequence complexity did not represent the only characteristic feature of intrinsically disordered proteins. However, some general sequence peculiarities of natively unfolded proteins have been recognized long ago. These include the presence of numerous uncompensated charged groups, resulting in a large net charge at neutral pH (Hemmings et al. 1984; Gast et al. 1995; Weinreb et al. 1996) and a low content of hydrophobic amino acid residues (Hemmings et al. 1984; Gast et al. 1995).
Recently, we have established that the combination of low mean hydrophobicity and relatively high net charge represents an important prerequisite for the absence of compact structure in proteins under physiological conditions, leading to natively unfolded proteins (Uversky et al. 2000a). Figure 2B shows that natively unfolded proteins are specifically localized within a unique region of the charge-hydrophobicity phase space. The solid line in this figure represents the border between intrinsically unstructured and native proteins, satisfying the following relationship:
This equation gives the estimation of the “boundary” mean hydrophobicity value, 〈H〉boundary, below which a polypeptide chain with a given mean net charge 〈R〉 will most probably be unfolded. Thus, sequences of natively unfolded proteins may be characterized by a low sequence complexity and/or high net charge coupled with low mean hydrophobicity.
How unfolded are intrinsically unstructured proteins? A classification attempt
It is well known that in the presence of large concentrations of strong denaturants, such as 8 M urea or 6 M GdmCl, normal proteins lose the majority of their specific structure, that is, become essentially unfolded (Anson and Mirsky 1932; Mirsky and Pauling 1936; Neurath et al. 1944; Tanford 1968). One can expect that under these conditions unfolded proteins will obey the theoretical and empirical rules that apply to linear random coils (Tanford 1968). In accordance with Tanford, a polymer molecule is randomly coiled when internal rotation can take place at about every single bond of the molecule with the same freedom with which it would take place in a molecule of low molecular weight containing the same kind of bonds (Tanford 1968). The properties of linear random coils are well understood, as synthetic polymers frequently adopt this conformation (Flory 1953; Tanford 1961). Because the dimensions of random coils depend only on the backbone rotational angles, the dependence of the hydrodynamic dimensions on molecular mass (length of polypeptide chain) represents the most effective diagnostic tool for recognition of linear random coils (Tanford 1961, 1968). Results of early studies on hydrodynamic dimensions of proteins in the presence of 6 M GdmCl were consistent with the conclusion that unfolded proteins could be described as random coils (Tanford 1961, 1968). However, it was later established by heteronuclear NMR that even in high concentrations of strong denaturants, when the native state of globular proteins breaks down, the polypeptide chains contained some amount of residual structure, that is, the polypeptide chain did not reach a random coil conformation (Dill and Shortle 1991; Logan et al. 1994; Zhang et al. 1994; Shortle 1996; Pappu et al. 2000; Baldwin and Zimm 2000). These findings raised several compelling biophysical questions related to the structural characteristics of natively unfolded proteins. How unfolded are these proteins? Are they random coils, or do they possess residual structure? If they have residual structure, how then should they be classified? Fortunately, the information accumulated to date on natively unfolded proteins allows us to make an initial structural classification of these intriguing members of the polypeptide kingdom.
As it follows from their definition, intrinsically unstructured proteins show complete (or almost complete) loss of any ordered structure under physiological conditions in vitro; that is, they should behave as random coils. Structurally, this may be manifested by (1) larger hydrodynamic dimensions compared to typical native globular proteins with corresponding molecular mass, (2) low content of ordered secondary structure, and (3) high intramolecular flexibility. Such anomalous behavior is usually detected by numerous hydrodynamic techniques (gel-filtration, viscometry, SAXS, SANS, sedimentation, and dynamic and static light scattering), far-UV CD, ORD, FTIR, and NMR spectroscopy (one-dimensional and heteronuclear multidimensional). These techniques may also be used for the identification of residual structure (if any) in an unfolded protein molecule. Once again, simultaneous application of several approaches should permit one to make more reliable conclusion.
Flexibility and residual structure by NMR spectroscopy
NMR spectroscopy of natively unfolded proteins has established that they contain varied amounts of residual structure. Examples of proteins which are essentially unfolded under in vitro physiological conditions include: DFF45 N-terminal domain (Zhou et al. 2001), DNA-binding domain of vitamin D receptor (Craig et al. 1997), C-terminal domain of anti-sigma factor FlgM (Daughdrill et al. 1998), p53 regulatory domain of p19Arf tumor suppressor (DiGiammarino et al. 2001), substrate-binding peptide from DNA polymerase I (Mullen et al. 1993), poplar apo-plastocyanin (Bai et al. 2001), N-terminal domain of StAR (Song et al. 2001), bone sialoprotein and osteopontin (Fisher et al. 2001), C-terminal domains of α- and β-tubulins (Jimenez et al. 1999), N-terminal activation domain of heat-shock transcription factors (Cho et al. 1996); 4E-binding proteins I and II (Fletcher et al. 1998), cyclin-dependent kinase inhibitor p21Waf1/Cip1/Sdi1 (Kriwacki et al. 1996), SNase, Δ131Δ fragment (Alexandrescu et al. 1994; Gillespie and Shortle 1997a, 1997b), dessication-related protein (Lisse et al. 1996), functional domain of eIF4G1 (Hershey et al. 1999), cytoplasmic domain of synaptobrevin (Hazzard et al. 1999), N-terminal domain of prion protein (Donne et al. 1997), C-terminal HMG domain of LEF-1 (Love 1999), N-terminal region of TAFII-23011–77 (Liu et al. 1998), antitermination protein N (Mogridge et al. 1998), cytoplasmic domain of Snc1 (Fiebig et al. 1999), prothymosin α (Uversky et al. 1999), nonhistone chromosomal proteins HMG-14 (Cary et al. 1980), HMG-17 (Abercrombie et al. 1978), HMG-T and HMG-H6 (Cary et al. 1981), fibronectin-binding domains (Penkett et al. 1998), DNA-binding domain of GCN4 (Weiss et al. 1990), EMB-1 protein (Eom et al. 1996), NEF protein (Geyer et al. 1999), osteocalcin (Isbell et al. 1993), two-domain fragment of neutral zinc finger factor 1 (Berkovits and Berg 1999), and several other proteins. On the other hand, heteronuclear multidimensional NMR analysis provided evidence of some ordered structure in several natively unfolded proteins, although the amount and quality of residual structure varied tremendously. In some cases the authors were unable to detect any secondary or tertiary contacts (e.g., Abercrombie et al. 1978; Cary et al. 1980, 1981; Cho et al. 1996; Eom et al. 1996; Lisse et al. 1996; Penkett et al. 1997; Fletcher et al. 1998; Zhang and Matthews 1998; Bose et al. 1999; Uversky et al. 1999; Campbell et al. 2000; Fisher et al. 2001; DiGiammarino et al. 2001). In other cases it was concluded that the proteins contained mostly dynamic structure favoring helical or β-structural conformation (e.g., Mullen et al. 1993; Schmitz et al. 1994; Kriwacki et al. 1996; Gillespie and Shortle 1997a, 1997b; Daughdrill et al. 1998; Bai et al. 2001, and many others). Thus, NMR analysis clearly showed that natively unfolded proteins do not possess uniform structural properties, as expected for members of a single thermodynamic entity.
As previously discussed, the most unambiguous characteristic of the conformational state of a globular protein is its hydrodynamic dimension. In fact, it has been shown that equilibrium conformations of a globular protein (native, molten globule, premolten globule, and unfolded states) may easily be discriminated by the degree of compactness of the polypeptide chain (Uversky 1993, 1994, 1997, 1998; Uversky and Ptitsyn 1994, 1996a; Ptitsyn 1995). The equilibrium conformations were characterized by very different dependencies of their hydrodynamic dimensions on molecular mass (length of polypeptide chain) (Tanford 1961, 1968; Uversky 1993; Tcherkasskaya and Uversky 2001).
To clarify the physical nature of natively unfolded proteins, Figure 3 represents the dependencies of their hydrodynamic dimensions on the length of their polypeptide chains (see Table 1). The same trends determined for globular proteins in their native, molten globule, premolten globule, and urea or GdmCl unfolded states are shown for comparison. The values of the hydrodynamic volumes, Vh, were calculated from the corresponding Stokes radii, RS, as Vh = 4/3 π RS3. Data for the different conformations of globular proteins were taken from Tcherkasskaya and Uversky (2001). For these species, the dependencies of Vh on the length of the polypeptide chains, N, were described by a set of straight lines (using a logarithmic scale):
Here, N, MG, PMG, U(urea), and U(GdmCl) correspond to the native, molten globule, premolten globule, urea, and GdmCl unfolded globular proteins, respectively.
The existence of significant differences in the molecular mass dependencies of the Vh measured for urea and GdmCl unfolded proteins should be emphasized. It is well established that the hydrodynamic dimensions of random coils essentially depend on the quality of solvent (Flory 1953; Tanford 1961, 1968). A poor solvent induces the attraction of macromolecular segments, resulting in the squeezing of a chain. On the other hand, in a good solvent repulsive forces occur between segments, leading to the formation of a loose fluctuating coil (Grossberg and Khokhlov 1989). It is assumed that solutions of urea and GdmCl are rather good solvents for polypeptide chains, with GdmCl being closer to the ideal one (Tanford 1961, 1968). This difference in solvent quality could account for the observed divergence in log(Vh) versus log(N) dependencies for the globular proteins unfolded by urea and GdmCl.
Figure 3 shows that natively unfolded proteins are much less compact compared to native and molten globule proteins of similar molecular mass. Surprisingly, under physiological conditions in vitro (i.e., in aqueous solution, a poor solvent for a polypeptide chain), natively unstructured proteins are split in two different subclasses (see also Table 1). Proteins from the first subclass, which consists of 17 representatives, behave as random coils in poor solvent, whereas the 18 proteins of the second subclass are essentially more compact, being close to premolten globules as it follows from their hydrodynamic parameters (cf. equations 3–53, 4, 5):
where NU(coil) and NU(PMG) correspond to the natively unfolded proteins with coil-like and premolten globule-like hydrodynamic characteristics, respectively. This very intriguing observation, confirming conclusion on the structural diversity of intrinsically disordered proteins drawn from their NMR analysis, may be further verified by the analysis of far-UV CD spectra.
Residual secondary structure from far-UV CD spectra
Unfolded polypeptide chains are characterized by very specific shapes of their far-UV CD spectrum, with an intensive minimum in the vicinity of 200 nm and an ellipticity close to zero in the vicinity of 222 nm (Adler et al. 1973; Provencher and Glöckner 1981; Johnson 1988; Woody 1995; Fasman 1996; Kelly and Price 1997; Uversky 1999). This is a very useful graphical criterion for the selection of natively unfolded proteins (see Table 1). To date, a coil-like shape of far-UV CD spectrum has been reported for ∼100 proteins (see Table 1), which is almost threefold larger than the number of proteins shown to be unfolded in accordance with their hydrodynamic dimensions (35).
Figure 4 represents a “double wavelength” plot, [θ]222 versus [θ]200 that may be used to assort natively unfolded proteins into two nonoverlapping groups. Fifty-one proteins were characterized by far-UV CD spectra characteristic of almost completely unfolded polypeptide chains: with [θ]200 = υ(18,900 ± 2800) deg•cm2•dmol−1 and [θ]222 = −(1700 ± 700) deg • cm2 • dmol−1. On the other hand, 44 other protein spectra were consistent with the existence of some residual secondary structure, possessing shape typical of the premolten globule state of globular proteins (with [θ]200 =−(10,700 ± 1300) deg•cm2•dmol−1 and [θ]222 = −(3900±1100) deg • cm2 • dmol−1).
Definitely, the difference in the shape of far-UV CD spectra alone does not allow the unambiguous discrimination between the two conformations. However, among more than 100 reported cases, 23 proteins were simultaneously characterized by CD and hydrodynamic methods, making classification more certain (see Table 1). Intrinsic premolten globules and intrinsic coils studied by both techniques are indicated in Figure 4 as white-dotted and black-dotted symbols, respectively. These data are consistent with the important conclusions that more compact polypeptides (with PMG-like hydrodynamic characteristics) possess larger amounts of ordered secondary structure than less compact coil-like natively unfolded proteins. Thus, the simultaneous application of CD and hydrodynamic techniques leaves no doubts that natively unfolded proteins should be subdivided into two structurally distinct groups: intrinsic coils and intrinsic premolten globules.
Amino acid composition of native coils and native premolten globules
Figure 5 compares the amino acid compositions of intrinsic coils and intrinsic premolten globules. Protein sets analyzed in Figure 4 were used to create these graphs. The inset to Figure 5 shows that proteins from both subclasses occupy the same region of the charge-hydrophobicity phase space. Native coils were more dispersed, whereas intrinsic premolten globules were localized closer to the border between intrinsically unstructured and native proteins (cf. Fig. 2). To confirm this idea, the corresponding distances between the given sequence and the border between intrinsically unstructured and native proteins were calculated. Results of this comparison are shown in Figure 5 as Δ〈H〉 = (〈H〉boundary − 〈H〉) plots. The mean “boundary” hydrophobicity, 〈H〉boundary, for a given polypeptide chain with a mean net charge 〈R〉 has been calculated using equation 11. One can see that intrinsic coils are essentially more distant from the border than intrinsic premolten globules. Statistical analysis shows that the averaged Δ〈H〉 values are −(0.089 ± 0.086) and −(0.037 ± 0.033) for the native coils and native premolten globule, respectively. However, because the sequence characteristics of the two subclasses overlap, it is difficult to differentiate these proteins by taking into account their mean hydrophobicity and mean net charge only. Probably, some other sequence features, such as propensity to form secondary structure should also be considered.
Function-related folding of the intrinsically unstructured proteins
The functional importance of being disordered has been intensively analyzed (Schulz 1979; Pontius 1993; Dunker et al. 1997, 2001; Plaxco and Gross 1997; Wright and Dyson 1999). It has been established that increased intrinsic plasticity represents an important prerequisite for effective molecular recognition (Plaxco and Gross 1997; Wright and Dyson 1999; Dunker et al. 2001). The variety diapason of biological functions for intrinsically disordered proteins is very wide, including cell cycle control, transcriptional and translational regulation, modulation of activity and/or assembly of other proteins, and even regulation of nerve cell function (reviewed in Wright and Dyson 1999; Dunker et al. 2001). Importantly, it must be emphasized that the majority of intrinsically disordered proteins undergo a disorder-to-order transition upon functioning (Schulz 1979; Pontius 1993; Spolar and Record 1994; Rosenfeld et al. 1995; Plaxco and Gross 1997; Dunker et al. 1997, 2001; Wright and Dyson 1999). It has been suggested that the persistence of natively unfolded proteins throughout evolution may reside in advantages of flexible structure during disorder–order transitions in comparison with rigid proteins (Dunker et al. 1997, 1998, 2001; Romero et al. 1998b; Wright and Dyson 1999). Among the potential advantages of intrinsic lack of structure and function-related disorder–order transitions are (1) the possibility of high specificity coupled with low affinity (Schulz 1979; Kriwacki et al. 1996; Dunker et al. 1998, 2001); (2) the ability of binding to several different targets (Wright and Dyson 1999; Dunker et al. 2001), known as one to many signaling (Romero et al. 1998b); (3) the capability to overcome steric restrictions, enabling essentially larger interaction surfaces in the complex than could be obtained for the rigid partners (Meador et al. 1992; Choo and Schwabe 1998; Dunker et al. 2001); (4) the precise control and simple regulation of the binding thermodynamic (Schulz 1979; Spolar and Record 1994; Rosenfeld et al. 1995; Wright and Dyson 1999; Dunker et al. 2001); (5) the increased rates of specific macromolecular association (Pontius 1993; Dunker et al. 2001); and (6) the reduced lifetime of intrinsically disordered proteins in the cell, possibly representing a mechanism of rapid turnover of the important regulatory molecules (Wright and Dyson 1999). There is, however, an alternative explanation for the involvement of intrinsic disorder in protein function. By computer modeling it has been shown that selective pressure for functionality is rather unrelated to that for stability and foldability. In this view, a protein that is successfully folded into one structure would likely be as functional as a protein that successfully folded into an alternative structure. Including functionality in the model does not greatly alter the distribution of the observed structures (Williams et al. 2001). This could mean that metastable proteins are favored during evolution because there is a tremendously larger amount of sequences coding for these proteins compared to the very rigid ones. In other words, the involvement of intrinsic disorder in protein function may be related to history and evolution rather than to functional needs (Dunker et al. 2001).
In their excellent review, Dunker et al. (2001) formulated the idea that the protein structure–function paradigm (which emphasizes that ordered 3D structures represent the indispensable prerequisite to the effective protein functioning) should be altered as The Protein Trinity paradigm (see Fig. 6A). According to The Protein Trinity model, native intracellular proteins (or their functional regions) can exist in any of the three thermodynamic states, ordered, molten globule, and random coil. Function can arise from any of the three conformations and transitions between them. “In this view, not just the ordered state, but any of the three states can be the native state of a protein” (Dunker et al. 2001). Experimental results on the conformational behavior of intrinsically unstructured (natively unfolded) proteins indicated, however, that these proteins did not possess uniform structural properties, as expected for members of one thermodynamic group, random coils. They were split into two structurally different subclasses, which, by analogy with conformational states of globular proteins, may be designated as intrinsic coils and intrinsic premolten globules. Moreover, it was already noted that molten globule and premolten globule might represent different phase states of the protein, as they are separated by the first-order phase transition (Uversky and Ptitsyn 1994, 1996a; Ptitsyn 1995; Uversky 1997, 1998). These observations bring a new player, the native premolten globule, on the protein functioning field. In other words, The Protein Trinity should be extended to The Protein Quartet model, with function arising from four specific conformations (ordered forms, molten globules, premolten globules, and random coils) and transitions between any two of the states (see Fig. 6B). Experimental evidences for the validity of this extension are presented below.
It has been established that the binding of Zn2+ induces partial folding of intrinsically unfolded proteins, such as thymosin α1 (Grottesi et al. 1998), prothymosin α (Uversky et al. 2000b), human sperm protamines P2 and P3 (Gatewood et al. 1996), and phosphodiesterase γ-subunit (Uversky et al. 2002). Human α-synuclein was also shown to be partially folded in the presence of several divalent and trivalent metal ions (Uversky et al. 2001a). Analysis of structural changes associated with the cation binding showed that the transformation of intrinsic coils into premolten globule-like conformations took place in these cases.
Function-related coil–molten globule transitions
The myelin basic protein, MBP, is a major protein of myelin, the multilamellar membranous sheath surrounding nerve axons. MBP was isolated in water-soluble or detergent-soluble form together with endogenous myelin lipids. The water-soluble form is a member of the intrinsic coil family. Binding of lipids transformed this protein into the molten globule-like conformation (Polverini et al. 1999). Similar structural rearrangements were induced in the coil-like 77–262 fragment of the glucocorticoid receptor (Baskakov et al. 1999) and in the N-terminal domain of HIV-1 integrase (Zheng et al. 1996) by TMAO and by Zn2+ binding, respectively. Self-association of an intrinsically unfolded γ-subunit of phosphodiesterase induced folding of this protein into a molten globule-like conformation (Uversky et al. 2002).
Function-related coil–rigid structure transitions
The N-terminal domain of the caspase-activated DNA fragmentation factor DFF45 was unfolded in solution. Its folding into the rigid 3D structure was induced upon interaction with the N-terminal domain of DFF40 (Zhou et al. 2001). Structural analysis revealed that the isolated 50S ribosomal proteins, L22 and L27, and 30S ribosomal protein S19 were essentially unfolded in solution (Venyaminov et al. 1981). However, they transform into a rigid well-folded conformation in the functional ribosome (Yusupov et al. 2001).
The human antibacterial peptide LL-37 existed in the premolten globule like conformation at micromolar concentrations in aqueous solution. A cooperative transition from a disordered to a helical molten globule-like structure was observed in the presence of several anions or with increasing protein concentration. The extent of α-helicity correlated well with the antibacterial activity of LL-37 against both Gram-positive and Gram-negative bacteria (Johansson et al. 1998). Comparably the degree of folding was induced in osteocalcin as a result of binding of Ca2+, Lu3+ (Isbell et al. 1993) or Pb3+ (Dowd et al. 2001). Premolten globule to molten globule transitions accompanied Ca2+ binding to skeletal muscle sarcoplasmic reticulum calsequestrin (Cozens and Reithmeier 1984; He et al. 1993) and SPARC, an extracellular glycoprotein expressed in mineralized and nonmineralized tissues (Engel et al. 1987).
The DNA-binding domain of the 1,25-dihydroxyvitamin D3 receptor was shown to undergo a premolten globuleto-molten globule transition as a result of specific Zn2+ binding. This cation-induced folding was an important prerequisite to the formation of functional complex with osteopontin and several vitamin D response elements (Craig et al. 1997). The first step in steroidogenesis is the movement of cholesterol from the outer to inner mitochondrial membrane, which is facilitated by the steroidogenic acute regulatory protein StAR. The interaction of premolten globule-like StAR with dodecylphosphocholine and phospholipid liposomes was accompanied by transition of the protein into the molten globule conformation (Song et al. 2001). The E7 gene of the human papillomaviruses encodes a 98-amino acid chain of a multifunctional nuclear phosphoprotein, E7 protein, which cooperates with an activated ras oncogene to transform primary rodent cells. CD spectroscopy indicated that Zn2+ and Cd2+ binding by the HPV16 E7 protein induced structural transformations consistent with premolten globule–molten globule transition (Pahel et al. 1993).
Transitions of intrinsic premolten globules to rigid conformation
It was shown that self-dimerization and DNA binding induce rigid 3D structure in the intrinsic premolten globule-like Max protein (Ferre-D'Amare et al. 1993; Horiuchi et al. 1997). Specific Ca2+ binding initiated folding of the premolten globule-like B-repeat segment of SdrD (Josefsson et al. 1998). Similarly, the 50S ribosomal proteins L2, L3, L14, L23, L24, and L32, as well as the 30S ribosomal proteins S12 and S18 were native premolten globules in their free forms (Venyaminov et al. 1981), but adopted rigid well-folded conformations during the formation of a functional ribosome (Yusupov et al. 2001).
The Escherichia coli RNase HI variant, with the K86A mutation, was purified in two forms: nicked and intact. The nicked protein, resulting from the cleavage of a Lys87–Arg88 peptide bond, was enzymatically active. The N-terminal fragment possessed characteristics of molten globule, whereas the C-terminal fragment was essentially disordered. The premolten globule-like C-fragment underwent a transition to the rigid 3D structure as a result of RNase HI reconstitution (Kanaya and Kanaya 1995). Comparably, the intrinsically unstructured β-subunit of SMK killer toxin folded into a rigid conformation as a result of interaction with the α-subunit (Suzuki et al. 1997). Finally, the formation of a yeast SNARE complex was accompanied by a complete folding of the two of its components, Snc1 and Sec9 (Rice et al. 1997).
RNase P is the endoribonuclease responsible for the 5′-maturation of precursor tRNA transcript. Intriguingly, RNase P from Bacillus subtilis, being predominantly unfolded in 10 mM sodium cacodilate at neutral pH, folded into a native α/β structure upon addition of various small molecular anions (Henkels et al. 2001). This protein (Henkels et al. 2001), as well as the reduced RNase T1 (Baskakov and Bolen 1998), also underwent a cooperative folding transition upon addition of the osmolyte TMAO.
The cyclin-dependent kinase (Cdk) inhibitor p21Waf1/Cip1/Sdi1 and its N-terminal fragment lack stable secondary and tertiary structure in the free solution state. In sharp contrast to the disordered free solution state, these proteins adopted an ordered stable conformation when bound to Cdk2 (Kriwacki et al. 1996).
Conformational analysis shows that natively unfolded proteins do not represent a uniform family, but rather two structurally different groups. Proteins from the first group have hydrodynamic dimensions typical of random coils in poor solvent (i.e., they behave as slightly squeezed coils) and do not possess any (or almost any) ordered secondary structure. Proteins from the second group are essentially more compact (but still significantly less compact than native or molten globule proteins). They exhibit some amount of ordered secondary structure being characterized by far-UV CD spectra as typical essentially disordered polypeptide chain, with a pronounced minimum in the vicinity of 200 nm. By analogy with the conformational classification of “normal” globular proteins, intrinsically unstructured proteins could be divided in intrinsic coils and intrinsic premolten globules.
Because the amino acid sequences of native coils are similar to native premolten globules (only slightly less hydrophobic and slightly more charged), some other sequence features (e.g., propensity to form secondary structure) have to be taken into account for the unambiguous sequence-based separation of the intrinsic coils from the intrinsic premolten globules.
An intriguing property of intrinsically unstructured proteins is their capability to undergo disorder-to-order transition upon functioning. The degree of these structural rearrangements varies over a very wide range, from coil–premolten globule transitions to formation of rigid ordered structures. Thus, protein functioning may be described by the Protein Quartet model, with biological activity arising from four unique conformations of the polypeptide chain (ordered forms, molten globules, premolten globules, and random coils) and transitions between any of them.
Table Table 1.. Major structural characteristics of the “natively unfolded” proteins
Length, a. a. r.
[θ]200, deg cm2 dmol−1
[θ]222, deg cm2 dmol−1
Grottesi et al. 1998
c-Jun oncoprotein, basic subdomain
Krebs et al. 1995
P19Arf tumor suppressor protein, N-terminal fragment
Heat-shock transcription factor, N-terminal activation domain (S. cerevisiae)
Cho et al. 1996
Protein phosphatase inhibitor-1
Nimmo and Cohen 1978
Glucocorticoid receptor, 77–262 fragment
Baskakov et al. 1999
Heat-shock transcription factor, N-terminal activation domain (K. lactis)
Cho et al. 1996
Bhattacharyya and Das 1999
Dopamine- and cAMP-regulated neuronal phosphoprotein, DARPP-32
Nemmings et al. 1984
50S ribosomal protein L3
Venyaminov et al. 1981
Tarkka et al. 1997
Manganese stabilizing protein, L245E mutant
Lydakis-Simantiris et al. 1999
Sec9, SNAP-25-like domain
Rice et al. 1997
50S ribosomal protein L2
Venyaminov et al. 1981
SPARC, BM-40, osteonectin
Engel et al. 1987
GAGA factor, central domain
Agianian et al. 1999
Calreticulin, human−41C fragment
Bouvier and Stafford 2000
Cozens and Reithmeier 1984; He et al. 1993
Bouvier and Stafford 2000
Bouvier and Stafford 2000
Soluble transducer HtrXI
Larsen et al. 1999
Taka-amylase A, reduced
SdrD protein, B1–B5 fragment
Josefsson et al. 1998
Secretogranin (chromatogranin B)
DNA topoisomerase I
Stewart et al. 1996
Pelta et al. 2000
I am grateful to Prof. A.K. Dunker for the valuable discussions. I thank Dr. P. Souillac for the careful reading and editing of the manuscript. I appreciate Prof. J. Goers for his invaluable help with the manuscript improvement.