Flexible nets

The roles of intrinsic disorder in protein interaction networks

Authors

  • A. Keith Dunker,

    1. Department of Biochemistry and Molecular Biology, and the Center for Computational Biology and Bioinformatics, Indiana University  School of Medicine, Indianapolis, IN, USA
    Search for more papers by this author
  • Marc S. Cortese,

    1. Department of Biochemistry and Molecular Biology, and the Center for Computational Biology and Bioinformatics, Indiana University  School of Medicine, Indianapolis, IN, USA
    Search for more papers by this author
  • Pedro Romero,

    1. Department of Biochemistry and Molecular Biology, and the Center for Computational Biology and Bioinformatics, Indiana University  School of Medicine, Indianapolis, IN, USA
    2. School of Informatics, Indiana University – Purdue University Indianapolis, IN, USA
    Search for more papers by this author
  • Lilia M. Iakoucheva,

    1. Laboratory of Statistical Genetics, The Rockefeller University, New York, NY, USA
    Search for more papers by this author
  • Vladimir N. Uversky

    1. Department of Biochemistry and Molecular Biology, and the Center for Computational Biology and Bioinformatics, Indiana University  School of Medicine, Indianapolis, IN, USA
    2. Institute for Biological Instrumentation, Russian Academy of Sciences, Moscow Region, Russia
    Search for more papers by this author

A.K. Dunker, Department of Biochemistry and Molecular Biology, and the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 714 N Senate Ave, Suite 250, Indianapolis, IN 46202, USA
E-mail: kedunker@iupui.edu

Abstract

Proteins participate in complex sets of interactions that represent the mechanistic foundation for much of the physiology and function of the cell. These protein–protein interactions are organized into exquisitely complex networks. The architecture of protein–protein interaction networks was recently proposed to be scale-free, with most of the proteins having only one or two connections but with relatively fewer ‘hubs’ possessing tens, hundreds or more links. The high level of hub connectivity must somehow be reflected in protein structure. What structural quality of hub proteins enables them to interact with large numbers of diverse targets? One possibility would be to employ binding regions that have the ability to bind multiple, structurally diverse partners. This trait can be imparted by the incorporation of intrinsic disorder in one or both partners. To illustrate the value of such contributions, this review examines the roles of intrinsic disorder in protein network architecture. We show that there are three general ways that intrinsic disorder can contribute: First, intrinsic disorder can serve as the structural basis for hub protein promiscuity; secondly, intrinsically disordered proteins can bind to structured hub proteins; and thirdly, intrinsic disorder can provide flexible linkers between functional domains with the linkers enabling mechanisms that facilitate binding diversity. An important research direction will be to determine what fraction of protein–protein interaction in regulatory networks relies on intrinsic disorder.

Abbreviations
CaM

calmodulin

Cdk

cyclin-dependent protein kinase

CKI

Cdk inhibitor protein

GSK3β

glycogen synthase kinase 3 beta

ID

intrinsically disordered

MoRE

molecular recognition element

NER

nucleotide excision repair

PDB

Protein Data Bank

PONDR®

predictors of naturally disordered regions

RGN

regular network

RNN

random network

SFN

scale-free network

XPA

xeroderma pigmentosum group A protein

FRAT

frequently rearranged in advanced T-cell lymphomas

Wnt

wingless type MMTV integration site family

HMG

high mobility group

VL-XT

a predictor of intrinsic disorder that integrates various methods-based predictor of long disordered regions and X-ray based N- and C-terminal predictors

VSL1

length-dependent predictor of intrinsic protein disorder

RPA

replication protein A

ERCC1

exchange repair cross complementing complex 1

TFIIH

transcription factor IIH

XAB

XPA binding protein

p27Kip

cyclin-dependent kinase inhibitor protein p27/1B

Scale-free networks and hubs

In biological systems, processes such as growth, energy generation, cell division and signaling are integrated by large, intricate networks. These biological networks, as well as certain nonbiological networks, especially those involved in communications such as the internet and cellular phone systems, are classified as scale-free networks (SFNs) [1–3]. The basic feature that separates these networks from non-SFNs such as regular networks (RGNs) or random networks (RNNs) is the presence of hubs. Hubs are highly connected nodes that have hundreds, thousands or even millions of links [1,4]. The existence of hubs and their frequency impart two features to SFNs that provide substantial benefit to large complex networks: (a) increased robustness with regard to random defects and (b) shorter distances (in terms of the number of intervening nodes) between any two points [5]. RGNs are grid-like with invariant node connectivity, whereas RNNs are characterized by stochastic variations in node connectivity [5,6]. Despite the random placement of links in RNNs, the vast majority of nodes still have approximately the same connectivity.

Figure 1A,B compare an RNN to a similar-sized SFN to illustrate an important property of the latter. In SNFs, the hub nodes are connected to a dramatically greater fraction of all nodes than the nodes with high connectivity in RNNs [7]. This provides the ability for a signal to travel from any node to any other by traversing a small number of intervening nodes. This feature imparts the so called small world property to SFNs [4,8]. Extending the example to a biological context, Fig. 1C represents an experimentally derived Saccharomyces cerevisiae protein–protein interaction map with 1870 protein ‘nodes’ connected by 2240 direct physical interactions [9]. Visual comparison of the model SFN (Fig. 1B) with the experimentally derived protein–protein interaction network (Fig. 1C) suggests a similar underlying architecture.

Figure 1.

Comparison of simulated and actual protein interaction networks. (A) The random network is rather homogeneous as most nodes have approximately the same number of links. (Reproduced from [205] with the permission of the author, © Institute of Physics and IOP Publishing Ltd., 2000–05.) (B) A scale-free network (SFN) is extremely inhomogeneous; while the majority of nodes have one or two links, a few nodes have a large number of links. To show this, five nodes with the highest number of links are colored red, and their first neighbors are colored green. While in the random network only 27% of the nodes are reached by the five most connected nodes, in the SFN more than 60% are. This demonstrates the key role that hubs play in the SFN. Note that both networks contain the same number of nodes (130) and links (430). (Reproduced from [205] with the permission of the author, © Institute of Physics and IOP Publishing Ltd., 2000–05.) (C) Yeast protein interaction network map. The color of a node signifies the phenotypic effect of removing the corresponding protein (red, lethal; green, nonlethal; orange, slow growth; yellow, unknown). (Modified from [9] with the permission of the authors, © Nature Publishing Group, 1998–2005).

The scale-free nature of protein–protein interaction networks gives them the advantages of high connectivity and robustness. For example, the connectedness of RNNs decays steadily as nodes fail in a random fashion. The surviving network breaks into progressively smaller and increasingly separate subnets that lose the ability to communicate with one another. On the other hand, SFNs show much less degradation from random node failure because highly connected hubs serve to maintain the integrity of the network. Because SFN hubs comprise a small fraction of total nodes, they are statistically less likely to fail as a result of random deleterious events. Although the error tolerance of SFNs is high, it is important to note that the failure of hubs quickly leads to the breakdown of connectivity [7]. This suggests that the SFNs are particularly resistant to random node removal but are extremely sensitive to targeted hub removal. In agreement with these observations, analysis of the S. cerevisiae protein–protein interaction network revealed that although proteins with five or fewer links constitute about 93% of the total number of proteins, only about 21% of them are essential to cell survival. By contrast, only 0.7% of yeast proteins are known to have more than 15 links (i.e., hubs), but for 62% of these, deletion is lethal [9].

A few caveats about the scale-free nature of biological networks are in order. First, the network coverage of S. cerevisiae[10,11], Caenorhabditis elegans[12], and Drosophila melanogaster[13] interactomes elucidated to date is sufficient only to suggest scale-free architecture. That is, the examined networks appear to be scale-free in nature, but the data constitute only a small fraction of the full interactomes. That the scale-free nature of biological protein–protein interaction networks is currently only a prediction was demonstrated by Han et al. [14]. These collaborators constructed simulated networks with random, exponential, scale-free and truncated normal topologies and a range of average connectivities. When the four topologies were sampled at the level comparable to that used to construct current protein–protein interaction maps, there was insufficient distinction among the derived sample networks to unequivocally assign the underlying architectures. Thus, conclusive proof that biological protein–protein interaction networks maintain a scale-free nature throughout full interactomes remains to be verified. Secondly, experimental protein–protein interaction data contains a significant fraction of false negatives and positives. This has been illustrated by studies comparing the results from various large-scale efforts [15–18]. Increasing the quality of existing data can be addressed by comparing and combining datasets and adding additional methods of analysis such as gene neighborhood, gene co-occurrence, gene fusion events, mRNA expression correlations and lethality of knockouts [19]. Despite the above described limitations, analysis of existing protein–protein interaction data can lead to useful information.

Comparison of protein–protein interaction networks across species has significant potential for the study of molecular evolution. Useful tools have been constructed for the comparison of protein–protein interaction networks across species [20–23], and interesting evolutionary inferences are being made, such as the observation that proteins having a more central position in the networks of different species (i.e., hubs) appear to evolve more slowly and are evidently more likely to be essential for survival. These observations are consistent with Fisher's classic proposal that pleiotropy constrains evolution [24,25]. Other important considerations include the timing and the locations of the hub protein interactions. Some hub proteins have multiple simultaneous interactions (party hubs), while others have multiple sequential interactions separated in time or in space (date hubs) [26]. It has been suggested that date hubs organize the proteome, connecting biological processes (modules; [27]) to each other, whereas party hubs act inside functional modules [26] and thereby may form scaffolds for various molecular machines or coordinated processes.

Neither the classical lock-and-key [28] nor the original induced fit mechanism [29] readily accommodates the multiple interactions of hub proteins, especially at higher connectivities. Therefore, the existence of highly connected hubs in scale-free protein–protein interaction networks suggests a different mechanism for molecular recognition. In fact, highly connected proteins were suggested to have some special, perhaps even common, structural features that would endow them with the ability to carry out highly specific interactions with many different proteins [30]. Gunasekaran et al. postulated that the relatively large solvent-accessible surface areas of extended disordered proteins could provide the potential for large intermolecular surfaces with less impact on cell size than if the same surfaces were provided by structured proteins [31]. Rather than maintaining a smaller cell size, the key advantage of intrinsic disorder may lie in providing the molecular basis for the existence, flexibility, and evolution of interaction networks. The following section explores a novel protein-disorder based mechanism that could provide the structural features necessary to allow proteins to carry out highly specific interactions with multiple, structurally diverse partners.

Many proteins have been shown to exist under apparently physiological conditions as dynamic ensembles. Instead of having relatively fixed bonds and angles as in structured proteins, the backbone bonds and angles of such proteins vary significantly over time, with no specific equilibrium values while undergoing noncooperative conformational changes. In other words, such proteins or protein regions do not have rigid 3D structure under physiological conditions in vitro[31–47]. Furthermore, these intrinsically disordered (ID) proteins and regions are known to carry out numerous biological functions including cell signaling [35], molecular recognition [48], and various other interactions with proteins and nucleic acids [32,34,35,37,39–43,49–51].

Recently, a number of groups have published predictors of protein disorder, several of which are available on the web (reviewed in [48]; see also http://www.disprot.org). These predictors are based on the assumption that the absence of rigid structure is encoded in specific features of the amino acid sequence [52,53]. In fact, statistical analysis shows that amino acid sequences encoding for ID proteins or regions are significantly different from those of ordered proteins on the basis of local amino acid composition, flexibility, hydropathy, charge, coordination number and several other factors [34,52,54–56]. A signature of a probable ID region is the presence of low sequence complexity coupled with amino acid compositional bias, characterized by a low content of bulky hydrophobic amino acids (Val, Leu, Ile, Met, Phe, Trp and Tyr), which typically form the cores of folded globular proteins, and a high proportion of particular polar and charged amino acids (Gln, Ser, Pro, Glu, Lys and, on occasion, Gly and Ala) [57,58]. These attributes were used to train artificial neural networks using standard back propagation methods to develop a series of ‘predictors of naturally disordered regions’ (PONDR®s) [52,55,56,59,60].

That intrinsic protein disorder is a common phenomenon is illustrated by the fact that thousands of proteins in the Swiss-Prot database were identified by PONDR® as having long regions of sequence that share distinguishing sequence attributes with known ID regions [55]. Furthermore, computational studies revealed that eukaryotes exhibit more disorder than either prokaryotes or archaea [34,60–62]. For example, in 22 bacterial and seven archaebacterial proteomes, the percentage of proteins with predicted regions of disorder ranged from 7% to 33% and from 9% to 37%, respectively. In contrast, in five eukaryotes, disorder ranged from 36% to 63%[60]. This large jump in the percentage of proteins with long predictions of disorder in nucleated, rather than non-nucleated, organisms was both remarkable and unexpected. To explain this and similar observations, it was hypothesized that the higher abundance of intrinsic disorder in eukaryotes could be a consequence of the increased need for cell signaling and regulation in higher organisms [34,35,58,60,63,64].

Qualitatively, it seems reasonable that unstructured proteins could serve as hubs, providing a simpler basis for responding to changes in the environment as compared to rigid proteins. For example, disordered regions can bind partners with both high specificity and low affinity [65], suggesting that disorder-based signaling and regulatory interactions can be highly specific but be easily reversed. These capabilities meet the fundamental requirements of signaling interactions – specificity and reversibility [49] with minimal structural requirements. Another crucial property of ID proteins and regions that could contribute to their function in signaling networks is binding diversity; i.e., their ability to partner with many other proteins and other ligands, such as nucleic acids [66]. This opens the possibility of disordered regulatory regions that are capable of binding many different partners.

An interesting consequence of the capability of ID proteins and regions to interact with different binding partners is the potential for polymorphism in the bound state. That is, such proteins could have completely different geometries in the rigidified structures that are induced upon binding to different partners [48]. This conjecture has been confirmed at atomic resolution. Portions of axin and frequently rearranged in advanced T-cell lymphoma protein (FRAT), which possess negligible sequence similarity, both interact with an intrinsically disordered loop of glycogen synthase kinase 3 beta (GSK3β) that adopts ordered structure upon binding [67]. The binding sites for the two molecules on GSK3β overlap substantially in the crystal structures solved for the axin–GSK3β and FRAT–GSK3β complexes [67]. Furthermore, although both bound peptides are primarily helical, their detailed structures and interactions with GSK3β have substantial differences [67]. The ability of GSK3β to bind two different proteins with high specificity via the same binding site is mediated by the conformational plasticity of the loop formed by residues 285–299. In the nonbound form of GSK3β[67], this loop is poorly defined in the electron density map, suggesting that it very likely occupies multiple conformations. However, this loop is induced to accommodate one of two completely distinct well-ordered structures, each of which is specific to the bound partner [67,68]. While some residues in this versatile GSK3β binding site are involved in interactions with both axin and FRAT, their local conformations in the bound state are different. In addition, other residues are involved uniquely with one ligand or the other [67].

By extending such detailed observations to protein–protein interaction networks in general, we suggest that a unique feature of disordered regions in hub proteins could be structural plasticity in the unbound state, which when combined with the capability of interacting with multiple structurally distinct partners, results in structural polymorphism in the bound state. These features have very important functional implications. By this means, ID hub proteins and regions could serve multiple and distinct signaling networks and be regulated via different pathways. For example, GSK3β plays a crucial role in the wingless-type MMTV integration site family (Wnt) signaling pathway by controlling the levels of β-catenin [69–71], and GSK3β is also known to be involved in insulin and growth factor signaling pathways [72–75]. GSK3β functions as a signal transducer for these two completely independent pathways without any obvious cross-talk or interference [67]. In the Wnt signaling network, a subset of the cellular GSK3β pool is incorporated into a multiprotein complex that brings GSK3β and its β-catenin substrate into close proximity. In the insulin signaling pathway, GSK3β operates via a completely different mechanism, where the phosphorylation of Ser9 converts the disordered N-terminus of GSK3β to an autoinhibitory segment that blocks access to the active site substrate binding cleft [76]. The functional segregation of the insulin and Wnt signaling networks requires either the absence of an exchange between the subsets of the cellular GSK3β molecules involved in each pathway, or suggests mutual exclusion of the two processes. That is, the involvement of GSK3β with the axin–adenomatous polyposis coli complex can reverse (via the action of the specific phosphatases associated with the mentioned complex [77,78]) or override the inhibitory Ser9 phosphorylation present on a recruited GSK3β molecule via the substantial enhancement in activity towards β-catenin afforded by the axin ‘scaffolding’[76]. Importantly for this minireview, GSK3β uses two different ID regions to participate in two completely unrelated pathways: the disordered N-terminal fragment (residues 1–34) for insulin signaling and the disordered loop (residues 285–299) for the Wnt network.

Intrinsic disorder and protein–protein interaction networks

The advantages of ID proteins and regions for forming associations with multiple partners led us to search the literature for hub proteins having detailed structural characterization. Table 1 contains a few examples of structurally characterized hub proteins that were identified. The range of structural types fell into three broad classes (as indicated in the ‘Percentage disordered’ column): entirely or mostly disordered (that is, nearly 100% disordered), partially disordered (a midrange percentage of disorder) and entirely or mostly ordered (nearly 0% disordered). For the mostly ordered hub proteins, we paid specific attention to the structure of the binding partners. In many cases, these partners contained regions of intrinsic disorder. One example of an ordered hub, calmodulin, makes use of a flexible hinge to facilitate binding diversity. Each of these classes and their roles in protein–protein interaction networks will be discussed in turn.

Table 1.  Number of interacting protein partners for hub proteins. Experimentally characterized disorder is described in terms of the start and stop residues of the disordered region(s) and overall percentage of disordered residues in the whole protein. The BIND, DIP and HPRD database query results are presented as number of protein–protein interactions/number of complexes (inter./comp.), while the STRING search results are presented as number of protein–protein interactions (inter.) only. Proteins are ordered by percentage of disorder. BIND,  the Biomolecular Interaction Network Database ( http://bind.ca/) [201]. DIP, the Database of Interacting Proteins (http://dip.doe-mbi.ucla.edu/) [202]. STRING, Search Tool for the Retrieval of Interacting Genes/Proteins (associations with confidence scores > 0.7) (http://string.embl.de/) [203]. HPRD, the Human Protein Reference Database (http://www.hprd.org/) [204]. N.D., not in database.
Protein (accession number)Disordered regions (start-stop)Percent disordered (%) BIND (inter./comp.) DIP (inter./comp.)STRING (inter.) HPRD (inter./comp.)Illustrative partners
  • a

    Disordered regions were based on missing residues in the PDB entries listed in the Swiss-Prot entry.

α-synuclein (P37840)1–140 [154]10010/1N.D. 2727/0parkin, tau, Aβ[155]; 14-3-3, CaM [156]
Caldesmon (P12957)1–771 [157]100235/35N.D. 277/0ERK [158]; S100a & b [159]; myosin, actin, calmodulin [160]
HMGA (P17096)1–107 [161]100485/331/0  413/3AP1, NF-κB, C/EBPβ, Oct −1 [80]; HIPK2, Sp1 [162]
Synaptobrevin (P63027)1–96 [163])100199/177/0  86/0syntaxin 1 A & 1B [164]; BAP31 [165]; VAMP assoc. protein A [166]; VAMP assoc. protein B [167]; SNAP-25 [168]
BRCA1 (P38398)170–1649 [169]79158/1416/011976/8p53, ATM, BRCA2, c-Myc [169]; Chk1 & 2 [170]
XPA (P23025)1–102, 210–273 [113]6327/24/0 4112/0RPA70, RPA34, ERCC1, TFIIH, XAB1 & 2 [171]
Estrogen receptor α (P03372)1–184 [172]3169/612/011690/1p53 [173]; BRCA1 [174]; TATA box binding protein [172]; calmodulin, c-Jun [175]
p53 (P04637)1–73 [176]; 183–188, 224–227 [177]; 291–312 [178]; 319–323, 357–360 [179]291900/4034/0239164/8Mdm2, ATM, ERK, p38, BCL-XL[180]
Mdm2 (Q00987)1–17 [181]; 17–24, 110– 125 [182]; 210–304 [183]2695/411/0 7229/3p53, ARF, ATM, CK2, HIF-1α[184]
Calcineurin, subunit A (Q08209)1–13, 390–414, 469–521 [185]16451/241/0 315/1NFAT [186]; calcipressin 1 [187]; cabin 1 [188]; SOCS-3 [189]; calsarcin [190]
14-3-3′ξ (P63104)68–77, 134–137, 230–245 [191]a1229/61/0 9761/1p53, Wee1, Tau, Raf-1 Cdc25C, Bad [192]
Cdk2 (P24941)36–46, 152–162 [121]a7322/611/012530/15 protein phosphatase 2 A, cyclin E1, DNA polymerase alpha [193]; BRCA1, cyclin A [194]
Actin (P68133)1–7, 42–52 [195]a57359/45713/0 3319/1profilin [196]; deoxyribonuclease I, vitamin D binding protein, thymosin beta-4, cofilin [197]
Calmodulin (P62152)77–81 [136]32962/883/0  950/0neurogranin, calcineurin, AC1 [198]; calponin [199]; caldesmon [200]

Mostly disordered hubs

Several hub proteins have been shown to be completely or almost completely disordered in solution, including α-synuclein, caldesmon, high mobility group protein A (HMGA), and synaptobrevin (Table 1). An interesting, well-studied, illustrative example of this group of hubs is provided by HMGA [formerly called HMG-I(Y)], a founding member of a new protein class called architectural transcription factors [79]. As discussed in more detail below, this protein has been implicated in the development of cancer and several other pathological conditions [80]. HMGA is considered a central hub of nuclear function, being able to bind to at least 18 known protein partners as well as to several specific DNA structures [80].

Both circular dichroism (CD) [81] and nuclear magnetic resonance (NMR) [82,83] indicate that HMGA lacks structure, with the molecule exhibiting a random coil-like structure over its entire length. The atypical electrophoretic mobility of this molecule [84] also suggests a high content of extended structure. Figure 2A compares the results of PONDR® analysis by two predictors of intrinsic disorder, firstly, a predictor of intrinsic disorder that integrates various methods-based predictor of long disordered regions and X-ray based N- and C-terminal predictors (VL-XT) [52,57,59] and secondly, a length-dependent predictor of intrinsic protein disorder (VSL1) [85]. While VL-XT is the most well-characterized member of the PONDR® family, VSL1 is more accurate overall and, indeed, obtained the best results of the 20 order/disorder predictors tested in the 6th Critical Assessment of Methods for Protein Structure Prediction (CASP6; http://predictioncenter.org/). In complete agreement with the experimental data, the predictor outputs in Fig. 2A indicate that the HMGA sequence has a high propensity for disorder over its entire length. However, HMGA was shown to undergo disorder-to-order transitions upon binding to DNA or protein partners [83,86,87]. For example, the DNA-binding regions of the HMGA assume a planar, crescent-shaped configuration called the ‘AT-hook’ when specifically bound to the minor groove of short AT-rich stretches of DNA [83,86,87].

Figure 2.

Order/disorder predictions on three hub proteins. (A) PONDR® VL-XT (red) and VSL1 (magenta) predictions on the human HMGA protein sequence (Swiss-Prot accession number P17096). Green horizontal bars correspond to the areas of the protein that have been identified as the minimal regions required for specific protein–protein interactions with other transcription factors (after [80]): 1, IRF-1; 2, ATF-1/c-Jun; 3, NF-Y; 4, SRF; 5, NF-κB; 6, p50; 7, Tst-1/Oct-6; 8, HIPK-2. Although only eight target proteins are shown here, it has been established that HMGA physically interacts with at least 18 transcription factors [80]. Dark yellow horizontal bars correspond to the areas of the protein (known as AT-hooks) that are involved in DNA binding. A PONDR® score = 0.5 corresponds to a prediction of disorder. (B) PONDR® VL-XT (red) and VSL1 (magenta) predictions on the Xenopus laevis XPA protein sequence (Swiss-Prot accession number P27088). Vertical red and blue bars correspond to the accessible and inaccessible trypsin digestion sites, respectively. Notice how, for the most part, cut sites within predicted ordered regions are not cleaved by the trypsin. Green horizontal bars correspond to the functional regions of XPA and reflect interactions sites with the following binding partners: 1, 34 kDa subunit of RPA [103,104]; 2, ERCC1[106,107]; 3, 70 kDa subunit of RPA [103,104]; 4, TFIIH [108,109]; 5, XAB1[101]. Dark yellow horizontal bar corresponds to the minimal fragment of XPA known to interact with damaged DNA [110]. (C) PONDR® VL-XT (red) and VSL1 (dark pink) predictions on the human Cdk2 sequence (Swiss-Prot accession number P24941). Horizontal green bars correspond to the regions of Cdk2 involved in the interaction with p27Kip1 (residues with atoms within 5 Å of atoms of p27Kip1[122]).

Despite its lack of ordered structure, HMGA participates in a wide variety of nuclear processes ranging from modulation of chromosome and chromatin mechanics [88,89] to acting as an architectural transcription factor that regulates the expression of more than 45 different eukaryotic and viral genes in vivo[79,90]. Besides their association with whole chromosomes [88,89], HMGA proteins also bind to individual nucleosomes, both in vitro and in vivo[91,92]. In addition to these unique DNA-binding characteristics, at least 18 different transcription factors have been reported to specifically interact with HMGA proteins (summarized in [80]). The list of proteins known to interact with HMGA includes transcription factors such as AP-1, ATF-2/c-Jun heterodimer, IRF-1, c-Jun, NF-κB p50/p65 heterodimer, C/EBPβ, Elf-1, NF-AT, NF-κB p50 homodimer, NF-κB p65, NF-Y, Oct-1, Oct-2 A, PIAS3, Pu.1, RNF4, SRF, and Tst-1/Oct-6 heterodimer [80].

HMGA gene expression is maximal during embryonic development [93] and has been suggested to be involved in the control of cell growth and differentiation [94]. Interestingly, overproduction of HMGA can be oncogenic and promote tumor progression and metastasis via dramatic alterations in numerous signaling pathways [80]. Based on these observations, it was suggested that HMGA proteins function in the cell as highly connected ‘nodes’ of protein–DNA and protein–protein interactions that influence a diverse array of normal biological processes including growth, proliferation, differentiation and death [80].

In summary, HMGA is a well-studied hub protein that in the absence of binding partners has been characterized to be completely disordered by NMR [82,83] and CD [81], supporting the hypothesis that hub proteins are strong candidates to possess appreciable amounts of disorder. The HMGA example was also a major factor leading to the suggestion that hub proteins as a group might depend on intrinsic disorder [49]. This supposition is supported below by several additional examples of hub proteins that utilize ID regions in their associations with multiple partners.

Partially disordered hub proteins

Table 1 also lists several hub proteins that possess an intermediate range of both ordered and disordered segments (internal loops and/or terminal tails), including BRCA1, Mdm2, XPA, p53, estrogen receptor α, and subunit A of calcineurin. Disordered regions appear to play important roles in the binding interactions of these hub proteins. The xeroderma pigmentosum group A protein (XPA) represents an interesting example of a partially disordered hub protein.

Xeroderma pigmentosum is an autosomal recessive human disease characterized by hypersensitivity to sunlight and a high incidence of skin cancer on sun-exposed skin [95,96]. This hypersensitivity is caused by defects in the nucleotide excision repair (NER) pathway that is necessary to correct many types of DNA damage [95,97]. XPA, consisting of 273 amino acid residues, participates in the assembly of the damage recognition/incision complex, recruiting several other functional proteins to the site of damage [98]. Particularly, it has been shown that this protein binds to three NER factors; replication protein A (RPA), exchange repair cross complementing complex 1 (ERCC1) and transcription factor IIH (TFIIH), as well as to UV- or chemical carcinogen-damaged DNA [99,100]. Furthermore, XPA was shown to interact with the cytoplasmic GTPase XPA-binding protein 1 (XAB1) [101] and with a tetratricopeptide repeat protein XAB2 [102].

Functionally, XPA can be divided into several distinct regions (Fig. 2B): (a) the N-terminal fragment (residues 4–29) that binds to a 34 kDa subunit of RPA [103,104]; the basic amino acid region (residues 30–42) that is responsible for localizing XPA in the nucleus [105]; the acidic region (residues 78–84) that is important for interaction with ERCC1 [106,107]; and the C-terminal region (residues 226–273) that binds to TFIIH [108,109]. Furthermore, the central region (residues 98–219) is the minimal polypeptide that preferentially binds damaged DNA [110]. Finally, the fragment 98–187 is necessary for binding to the 70 kDa subunit of RPA [103,104]. Figure 2B presents the distribution of the PONDR® VL-XT and VSL1 scores within the XPA sequence and illustrates the long predicted disordered regions at or near the two ends (from M1 to A55 and from S63 to P88 at the N-terminus and from L183 to E230 near the C-terminus). Importantly, Fig. 2B shows that the central DNA-binding domain is likely to be mostly ordered, whereas the multiple protein binding sites are located in the regions that are likely to be disordered.

In agreement with the predictions (Fig. 2B), NMR solution structure of a human XPA fragment containing the minimal DNA-binding domain (residues 98–219), revealed that one-third of this molecule is disordered [111,112]. These conclusions were further confirmed by the results of limited proteolysis analysis [113]: mild trypsin digestion produced cuts at 33 of the possible 48 sites, with no cleavage at any of the 14 possible sites in the internal DNA-binding region (Q85–I179) (Fig. 2B). The observed cleavage sites were predominantly in two of the large regions of predicted disorder [113]. In general, it is believed that cut sites within disordered regions are more easily cleaved by proteases than those found in structured areas [114,115]. Thus, excellent agreement was observed between PONDR® prediction of order and disorder and the observed sites of protease resistance and sensitivity, respectively [113].

Summarizing, XPA is an illustrative example of the class of hub proteins that contain disordered tails and/or loops as well as ordered regions. Importantly, flexibility of the disordered fragments in such proteins facilitates interactions with multiple binding partners without sacrificing specificity and enhances the ability of hub proteins to participate in multiple signaling pathways.

Ordered hubs interacting with disordered partners

Table 1 lists four examples of protein hubs that are mostly ordered: actin, calmodulin, Cdk2, and 14-3-3′ξ. Below, we describe how ordered hubs may interact with intrinsically disordered binding partners, and how such interactions may play crucial roles in regulation and coordination of hub protein activities.

The orderly progression of cells through the various phases of the cell cycle is governed by several distinct cyclin-dependent protein kinases (Cdks), which therefore are considered as the master timekeepers of cell division [116]. Unlike other protein kinases, Cdks are regulated by binding to their cyclin protein partners, forming active heterodimeric complexes. Eight Cdk family members (Cdk1–Cdk8) and nine cyclins (A–I) have been identified so far. Interestingly, each Cdk pairs with a separate cyclin class, most of which have at least two members [117,118]. For example, Cdk1 together with cyclin B1 directs the G2/M transition. Exit from G1, in contrast, is primarily under the control of cyclin D/Cdk4/6. Finally, two other cyclins (A and E) that pair with Cdk2 are required for the G1/S transition and progression through the S phase [117,118].

All Cdks have similar sizes (30–40 kDa) and share at least 40% sequence identity, including the highly conserved 300 residue catalytic core. On the contrary, the cyclin subunits vary in size (30–80 kDa), but all contain a homologous 100 residue cyclin box domain. The Cdk subunits are not catalytically active unless they bind to a cyclin partner and form a basally active complex. The fully active complex is produced when the Cdk is phosphorylated [116,119]. Crystal structures of several Cdks (phosphorylated and dephosphorylated) and their complexes with cyclins and inhibitors have been solved [119]. All Cdks have the same overall fold as other eukaryotic protein kinases. For example, monomeric Cdk2 consists of an N-terminal lobe rich in β-sheet (N lobe), a larger C-terminal lobe rich in α-helix (C lobe), and a deep cleft at the junction of the two lobes that is the site of ATP binding and catalysis [120]. Figure 2C represents distribution of the PONDR® VL-XT and VSL1 scores within the human Cdk2 sequence and illustrates that this protein is likely to be almost completely ordered, having only several small regions of predicted disorder. The crystal structure of human Cdk2 (PDB ID: 1URW) is consistent with this prediction, with only two regions containing missing residues [121]. Interestingly, these two segments, residues 36–46 and 152–162, overlap with experimentally identified regions of Cdk2 that interact with cyclin-dependent kinase inhibitor protein p27/1B (p27Kip1) (Fig. 2C) [122].

Regulation of Cdk activity is essential for the proper timing and coordination of numerous nuclear processes, including DNA replication and chromosome separation, required for cell growth and division. Not surprisingly, the activity of Cdks throughout the cell cycle is precisely directed by a combination of several mechanisms, including the control of cycle-dependent variations in the levels of activating partners, cyclins (via regulation of their synthesis and degradation); coordination of Cdk phosphorylation and dephosphorylation, which is required for controlled activation and deactivation of Cdks; and variations in the levels of the Cdk inhibitor proteins, CKIs, responsible for the deactivation of the Cdk–cyclin complexes [116,119]. Four major mammalian CKIs, which regulate cell proliferation through their physical interactions in the nucleus with Cdks, have been discovered so far. p21Waf1/Cip1/Sdi1 and p27Kip1 inactivate Cdk2 and Cdk4 cyclin complexes by binding to them, whereas p16INK4 and p15INK4B are specific for Cdk4 and Cdk6. Importantly, CKIs are able to inhibit Cdk–cyclin activity by blocking formation of active Cdk–cyclin complexes via binding to inactive Cdk or by binding to the active complex [116,119].

The Cdk inhibitor p21Waf1/Cip1/Sdi1 is important for p53-dependent cell cycle control, mediating G1/S arrest through inhibition of Cdks and possibly through inhibition of DNA replication [123]. A striking disorder-to-order transition for p21Waf1/Cip1/Sdi1 upon binding to one of its biological targets, Cdk2, was demonstrated by proteolytic mapping, CD spectropolarimetry, and NMR spectroscopy [66]. In fact, it has been established that p21Waf1/Cip1/Sdi1 and its NH2-terminal fragment, being active as Cdk inhibitors, lacked stable secondary or tertiary structure in free solution. However, the p21Waf1/Cip1/Sdi1 NH2-terminus adopts an ordered, stable conformation when bound to Cdk2 [66]. This intrinsically disordered nature probably explains the ability of p21Waf1/Cip1/Sdi1 to bind to and to inhibit a diverse family of cyclin–Cdk complexes, including cyclin A–Cdk2, cyclin E–Cdk2, and cyclin D–Cdk4 [124]. Thus, the intrinsic disorder of p21Waf1/Cip1/Sdi1 is associated with binding diversity and helps to explain the role for structural disorder in mediating binding specificity in biological systems [66].

The p27Kip1 protein is another member of the p21 family of Cdk inhibitors that negatively regulate the cell cycle and thereby contributes to cellular growth and development [125,126]. p27Kip1 contains an N-terminal Cdk-inhibition domain and a C-terminal domain of unknown function called the QT domain [125,127].

The crystal structure of the human p27Kip1 Cdk-inhibition domain (residues 22–106) bound to human cyclin A–Cdk2 complex shows that residues 25–93 of p27Kip1 bind in an ordered conformation comprising an α-helix, a 310 helix, and β-structure [120]. Importantly, the p27Kip1 Cdk-inhibition domain was shown to lack an intramolecular hydrophobic core. Instead, p27Kip1 interacts with the cyclin A–Cdk2 complex as an extended structure, being bound to both cyclin A and Cdk2. On cyclin A, it interacts with a groove formed by conserved cyclin box residues. On Cdk2, p27Kip1 binds and rearranges the amino-terminal lobe and also inserts into the catalytic cleft, mimicking ATP in the context of the cyclin A–Cdk2 complex [120]. In contrast, the unbound p27Kip1 Cdk-inhibition domain is intrinsically disordered (natively unfolded) as shown by both CD and NMR spectroscopy. The NMR spectra lack chemical-shift dispersion and exhibit negative peaks for the 1H-15N heteronuclear nuclear Overhauser effect [122,128,129]. Despite showing disorder before binding its targets, p27Kip1 has nascent, but transient, secondary structure that may have a function in molecular recognition [122].

The kinetic analysis of p27Kip1 folding induced by its binding to Cdk2–cyclin A complex revealed that this is a sequential process initiated by binding to cyclin A, which is accompanied by folding of 34 residues (as estimated by a method suggested in [130]). The binding of p27Kip1 to Cdk2 leading to the inhibition of kinase activity is much slower and is accompanied by the folding of 59 residues [122]. Based on these observations, it was proposed that p27Kip1 (and potentially other CKIs, such as p21 and p57) function essentially as ‘molecular staples’, wherein the ‘prongs’ of the staple (domains 1 and 2) are unstructured and flexible before binding Cdk–cyclin complexes. The staple analogy is completed by a linker helix (partially structured in the unbound state) that connects the two prongs. This analogy is illustrated in Fig. 3A, which presents a model for p27Kip1 binding to the Cdk2–cyclin A binary complex and shows the importance of both preformed, but transient structure in the linker region and the flexible nature of domains 1 and 2 for the efficient functioning of this protein [122]. Figure 3B shows that p27Kip1 is predicted to be mostly disordered by both PONDR® VL-XT and VSL1. Importantly, a region of predicted order overlaps with the fragment of p27Kip1 shown to contain a significant amount of regular secondary structure in its complex with Cdk2–cyclin A.

Figure 3.

Disorder-to-order transition upon binding for p27Kip1. (A) A model for p27Kip1 binding to the Cdk2–cyclin A binary complex. p27Kip1 is yellow, Cdk2 (K2) cyan and cyclin A (A) magenta. In these panels, the subunit not present in the experimental binding reactions is gray to emphasize the relevance of experimental data for binary binding reactions to the mechanism of binding the Cdk2–cyclin A complex (right). (Modified from [122] with the permission of the authors, © Nature Publishing Group, 1998–2005.) (B) PONDR® VL-XT (red) and VSL1 (dark pink) predictions on the human p27Kip1 sequence (Swiss-Prot accession number P46527). Green horizontal bars correspond to the regions of the protein involved in interaction with Cdk2 and cyclin A: 1, domain 1 interacting with cyclin A; 2, a linker helix involved in binding both cyclin A and Cdk2; 3, domain 2 interacting with Cdk2 [122]. Blue and dark yellow horizontal bars correspond to the helices and β-structure stabilized by the formation of a triple complex, cyclin A–Cdk2–p27Kip1.

The above-mentioned sequential folding-upon-binding mechanism has been suggested to be crucial for the selective inhibition of specific Cdk–cyclin complexes by corresponding CKIs. Furthermore, p21Waf1/Cip1/Sdi1 and p27Kip1 target the cell cycle CDKs (Cdk1, Cdk2, Cdk3, Cdk4 and Cdk6) but fail to bind and inhibit Cdk5 and Cdk7 [124]. Although the surfaces of the Cdks are not markedly different from each other and therefore cannot provide a basis for specificity, important differences in surface residues of the cyclin partners of these Cdks have been described [122]. Thus, p27Kip1 and other CKIs may have evolved to first recognize and bind these specificity-determining sites and then to fold into an extended structure that extends to the kinase subunit and consequently inhibits its activity [122].

Flexible linkers between functional domains

Calmodulin (CaM) is a 148 residue protein with four calcium binding sites that serves to mediate extracellularly induced Ca2+ signaling within the cytosol. CaM modulates the activity of a large number of enzymes by direct binding, with both calcium-dependent [131] and calcium-independent [132] binding modes, with the former likely to be much more common than the latter.

The regions bound by CaM are typically about 20 residues in length and mostly α-helical in nature [133,134]. CaM binding targets exhibit limited sequence identity and in many cases are nonhomologous, thus a mechanism that incorporates specificity but permits diversity must be encoded in CaM's structure. The X-ray crystal structure of CaM is dumbbell-like with two homologous globular domains connected by a rigid 26 residue α-helix [135]. Subsequent to the X-ray structure determination, NMR analyses revealed that residues 77–81 in the middle of the central helix were highly flexible and functioned as a hinge [136]. This hinge facilitates a binding mode in which CaM surrounds the target regions of its partners within the two Ca2+-binding, globular domains, and in some cases the hinge region remains unstructured after association with the CaM target [132].

The interior faces of the globular domains have features that accommodate target diversity, such as nonrigid helix–helix packing that allow backbone adjustments and high methionine contents that are especially adept at side chain adjustments [132]. An important structural feature enabling intermolecular binding with maximal surface area of interaction is the flexible connector between the two globular domains. This flexibility accommodates a high diversity of sequence features in the target by allowing the CaM surface to seek complementary interactions by sampling different positions and orientations relative to the binding target surface. Additionally, the flexible hinge facilitates variable separation of the two globular domains after binding has occurred, again allowing for binding diversity [136]. Despite the small size of the disordered region (just five residues) and the slight amount of disorder in the entire protein (just 3%), this disorder is crucial for the binding mechanism that enables CaM to associate with multiple partners. Overall, a wide range of diverse sequences are recognized by CaM [133,134].

ID regions vs. dehydrons as the basis for binding to multiple partners

The examples presented above emphasize the importance of intrinsic disorder for hub protein function: ID regions provide hubs with the ability to bind multiple structurally diverse targets, thereby enabling them to participate in and possibly regulate multiple networks. An alternative conjecture has been made, namely that the interactivity and connectivity of proteins of known structure in proteomic networks depends on the number of dehydrons (backbone hydrogen bonds that are insufficiently protected from the surrounding water molecules) observed for each protein [137–142]. Dehydrons, the coulombic stabilizing energy of which increases as water is excluded, are effectively adhesive, because association-induced local water removal increases this hydrogen-bond stability [141]. Important for the present work, the application of the dehydron concept in an analysis of the evolution of the yeast proteomic network [11,143], indicated that new binding partnerships could be promoted by relaxing the structure of the packing interface, which effectively increases the number of dehydrons [144]. This was interpreted to mean that the rate of increase in interactome complexity (i.e., the rate that new connectivities are adopted) coincides with the rate of increase of the number of dehydrons. According to this conjecture, proteins with greater numbers of dehydrons represent more plausible molecules for hub evolution.

How did the previous authors carry out the analysis of the yeast interactome and show that hub proteins were enriched in dehydrons? First, they showed that there is a statistical correlation between the local PONDR® VL-XT score and the degree of protection of the indicated hydrogen bond [144]. Based on this correlation, PONDR® VL-XT scores were used to show that increases in the number of apparent dehydrons represented a plausible basis for SFN evolution [144]. Although the occurrences of dehydrons and intrinsic disorder scores are correlated, it is not immediately evident which characteristic – dehydrons or disorder – actually explains the previous observations [144] and thus may provide the basis for hub evolution. As reviewed in more detail here and elsewhere [40,48], disordered regions have been shown to be able to adopt different structures upon interaction with structurally diverse partners (thereby increasing the potential number of partners that can be accommodated by a given protein segment), but no evidence has yet been presented that dehydrons have a similar capability. Thus, mutation-driven variation of locally disordered regions is more likely than dehydrons alone to be one of the key structural factors leading to the evolution of hub proteins.

How do ID regions work?

To explain one of the potential mechanisms used by ID proteins to interact with their binding partners, the concept of ‘molecular recognition element’ (MoRE) was introduced [145,146]. The MoRE defines an interaction-prone short segment of disorder that becomes ordered upon specific binding. There are three basic types of MoREs: those that form α-helical structures upon binding; those that form β-strands (in which the peptide forms a β-sheet with additional β-strands provided by the protein partner); and those that form irregular structures when binding [48,145–147]. Proposed names for these structures are α-MoRE, β-MoRE, and I-MoRE, respectively [48]. Of course, a given MoRE could be composed of more than one type of secondary structure type when bound to its partner, resulting in complex MoREs, such as I-α-I-MoREs, α-β-MoREs, etc.

MoREs can be detected experimentally as segments of disordered regions that maintain some residual structure (as in the case of p27Kip1[122]). They also can be discovered by analysis of protein–protein complexes deposited in the Protein Data Bank (PDB) [148] that consist of short nonglobular protein fragments bound to a large globular partner [48,145–147]. Originally, 14 α-MoREs were described in 12 proteins [147]. However, subsequent, more detailed analysis of PDB revealed more than 2500 such complexes, further analysis of which gave a nonredundant set of 372 nonhomologous MoREs. This dataset includes ≈ 10 000 residues, 27% of which were assigned to α-helical conformation, 12% were β-structural and 48% were irregular, whereas the remaining 13% of the residues were disordered (based on missing coordinates in the PDB files) (A. Mohan, P. Romero, C.J. Oldfield, V.N. Uversky & A.K. Dunker, unpublished data). Overall, this analysis shows that complexes of short nonglobular protein fragments bound to large globular proteins are rather abundant in PDB, and thus in nature.

Alternatively, MoREs can be detected computationally [146,147]. In fact, for several proteins it was noted that PONDR® VL-XT could identify experimentally characterized binding regions as short downward spikes (signifying propensity for order) flanked by strong predictions of disorder [145,149,150]. By combining additional sequence-derived criteria to the downward spike indications from PONDR® analysis, we developed an algorithm that identifies regions of high α-MoRE propensity [146,147]. Application of this predictor to several genomes revealed that, while somewhat less than 3% of bacterial and archaea proteins contain regions of high α-MoRE propensities, about 23% of eukaryotic proteins in general, and almost 50% of eukaryotic regulatory proteins in particular, contain α-MoRE signals [146,147]. This study suggests that disorder-to-helix transitions are common in protein interaction networks, but laboratory experiments are needed to test whether this mechanism is indeed as common as suggested.

Concluding remarks

Systematic postgenome proteome-wide analysis of protein interactions using large-scale two-hybrid screens suggest that these interactions can be described as complex SFNs [9,26,151]. On the other hand, many traditional approaches have been developed to analyze interactions, coordination, signaling and regulation on a smaller scale (e.g. within the scope of a single or multiple interacting pathways). Quite often these small-scale approaches yielded interesting results that were used to develop models that correctly predicted outcomes of changes and interruptions within the system studied. Integrating small-scale network information with global network information represents an important but difficult task. Successful completion of this task could lead to improved quality of the overall data, shed light on the mechanisms of timing and regulation, indicate how global and local properties of complex macromolecular networks affect observable biological properties (phenotype) and functions (physiology) and suggest how changes in such properties contribute to human diseases. Several groups are productively working on the task of understanding interesting subnetworks [152,153]. The example of GSK3β, in which different regions of intrinsic disorder are independently involved in different regulatory pathways, suggests that identifying and cataloguing ID regions in hub proteins could provide a useful approach for organizing protein–protein interaction subnetworks into larger networks.

We have presented evidence that ID proteins can play crucial functional roles in protein interaction networks. This evidence indicates how some hubs carry out function, how some local nets are integrated into global networks, and how multiple processes can be coordinated, regulated, timed and isolated from one another. The information available at this stage in the developing field of systems biology provides strong evidence for the existence of extensive complex networks combining global and local properties, but also demonstrates that much more data will be needed to develop reliable models. Carrying out a comparison of the number of interactions involving at least one disordered partner with the number of interactions involving structured partners for a few selected subnetworks would be very useful. Such a comparison would provide an estimate of the fraction of protein–protein interactions that utilize ID protein regions and the fraction that involve only structured proteins. While we anticipate that disorder-to-order transitions might be very common in hub protein interactions because of the binding diversity advantage that disorder provides, the extracted ratio would provide an estimate of the frequency that this property is utilized on global scales. Studies along these lines are currently in progress.

Acknowledgements

We thank A.L. Barabasi for permission to modify published figures and R.W. Kriwacki for assistance with figure modification. We also thank Tanguy Le Gall and Molecular Kinetics, Inc. for the use of the proprietary fragment-mapping program to mine PDB for some of the data presented in Table 1. The Indiana Genomics Initiative (INGEN) and NIH Grant 1 R01 LM007688-0A1 provided support to A.K.D. This work was supported in part by INTAS 2001-2347 Grant to V.N.U., and L.M.I. was supported by the grant MCB-0444818 from the National Science Foundation.

Ancillary