Exploring the binding diversity of intrinsically disordered proteins involved in one-to-many binding

Authors

  • Wei-Lun Hsu,

    1. Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana
    Search for more papers by this author
  • Christopher J. Oldfield,

    1. Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana
    Search for more papers by this author
  • Bin Xue,

    1. Department of Molecular Medicine, University of South Florida, Tampa, Florida
    Search for more papers by this author
  • Jingwei Meng,

    1. Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana
    Search for more papers by this author
  • Fei Huang,

    1. Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana
    Search for more papers by this author
  • Pedro Romero,

    1. Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana
    Search for more papers by this author
  • Vladimir N. Uversky,

    1. Department of Molecular Medicine, University of South Florida, Tampa, Florida
    2. Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
    Search for more papers by this author
  • A. Keith Dunker

    Corresponding author
    1. Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana
    • 410 W. 10th Street, HS5000, Indianapolis, IN 46202
    Search for more papers by this author

Abstract

Molecular recognition features (MoRFs) are intrinsically disordered protein regions that bind to partners via disorder-to-order transitions. In one-to-many binding, a single MoRF binds to two or more different partners individually. MoRF-based one-to-many protein–protein interaction (PPI) examples were collected from the Protein Data Bank, yielding 23 MoRFs bound to 2–9 partners, with all pairs of same-MoRF partners having less than 25% sequence identity. Of these, 8 MoRFs were bound to 2–9 partners having completely different folds, whereas 15 MoRFs were bound to 2–5 partners having the same folds but with low sequence identities. For both types of partner variation, backbone and side chain torsion angle rotations were used to bring about the conformational changes needed to enable close fits between a single MoRF and distinct partners. Alternative splicing events (ASEs) and posttranslational modifications (PTMs) were also found to contribute to distinct partner binding. Because ASEs and PTMs both commonly occur in disordered regions, and because both ASEs and PTMs are often tissue-specific, these data suggest that MoRFs, ASEs, and PTMs may collaborate to alter PPI networks in different cell types. These data enlarge the set of carefully studied MoRFs that use inherent flexibility and that also use ASE-based and/or PTM-based surface modifications to enable the same disordered segment to selectively associate with two or more partners. The small number of residues involved in MoRFs and in their modifications by ASEs or PTMs may simplify the evolvability of signaling network diversity.

Introduction

Protein–protein interaction (PPI) networks underlie a wide variety of biological functions, ranging from regulating cell division to responding to external signals. High throughput methods have enabled researchers to map out sets of PPIs over entire proteomes. These studies reveal complex networks in which a few proteins, called hubs, bind to many protein partners and many other proteins bind to only a few or even just one partner. Indeed, in some cases, hubs bind to 15, 20, 50, or even more partner proteins. As expected for such network architecture, deletion of a protein with only a few partners is typically less deleterious than the deletion of a highly connected protein.1, 2

How do such networks arise from simpler precursors? Other networks of a similar architecture arise because “the rich get richer”; units with more connections have a higher probability of adding even more connections over time as compared with the units with fewer connections. This suggests that highly connected proteins have special features that facilitate their binding to multiple partners and to new partners that arise through mutation.3 What are these special features?

Theoretical arguments4, 5 and experimental data6, 7 suggest that unfolded or disordered protein can very readily change shape and thereby easily adapt to multiple, distinct partners. Thus, we proposed that the special feature of hub proteins enabling their binding to multiple partners is likely to be intrinsic disorder. We further suggested two ways that disorder could be used by hub proteins for binding to multiple partners: (1) One region of disorder could bind to many different partners (one-to-many binding), so the hub protein itself uses disorder for multiple partner binding; and (2) many different regions of disorder could bind to a single partner (many-to-one binding), so the hub protein is structured but binds to many disordered partners via interaction with disorder.8 Since this initial proposal, we9–11 and many others12–22 have provided additional evidence that hubs and/or their binding partners are especially enriched in intrinsic disorder, with both the many-to-one and one-to-many processes involving the use of intrinsic disorder.

Our initial work8–11 on disorder and PPIs focused on single binding sites that used regions of disorder. To be more complete, it is worth mentioning that, in addition to the one-to-many and many-to-one mechanisms used by single sites of disorder for multiple partner binding, hub proteins can also use multiple binding domain repeats likely connected by flexible (disordered) linkers,12 or hubs can use multiple binding sites one after another in long regions of disorder as we recently discussed.23 Of course, these additional, multisite mechanisms can be multiplexed via one-to-many and many-to-one mechanisms, thus leading to extremely complicated PPI networks.

Independent of their roles in hub protein interactions, intrinsically disordered proteins (IDPs) lack of specific structure provides the basis for important biological functions,24, 25 such as signal transduction, cell regulation, molecular recognition, and many other functions.26–33 Many of these disorder-utilizing biological functions depend ultimately on disorder-based PPIs. Thus, understanding the structural basis of PPIs involving IDPs is important for a wide variety of biological functions, not just as the mechanistic basis for hub protein function.

With regard to IDP regions involved in binding, various descriptors have been used, such as eukaryotic linear motif (ELMs),34, 35 linear motifs (LMs),36 short linear motif (SLiMs),37, 38 regions of increased structural propensity (RISPs),39 and molecular recognition features (MoRFs).40 All of these describe similar phenomena, despite differing approaches used by the various researchers for identification of binding segments. The identification of ELMs, LMs, or SLiMs starts from sequence pattern or motif-based approaches, whereas the identification of RISPs and MoRFs starts from short regions with binding indicators located within longer regions of predicted disorder.

Predicting PPI sites in proteins can be used to supplement experimental approaches.41, 42 Predicting binding sites by sequence matches to the motifs of ELMs,34, 35 LMs,36 SLiMs,37, 38 or other collections of sequence patterns43–45 provides one strategy for identifying potential binding sites located within IDPs or IDP regions. Using sequence characteristics that indicate short binding regions within longer regions of disorder offers a second strategy that does not depend on specific motifs, and several predictors have been developed that use this second strategy.46–50 Such predictors have been used by experimentalists to help with the identification of binding regions within longer regions of disorder.39, 51

Both a hub protein's ability to bind multiple partners and the general importance of PPIs suggest that the use of flexibility for partner binding by IDPs and IDP regions is of considerable interest. However, despite the importance of understanding how one disordered region can bind to more than one partner, there have been very few structural comparisons at the atomic resolution level, either for one-to-many binding examples or for many-to-one binding examples. For the latter, we know of only two atomic resolution comparisons of more than one IDP binding to a single partner: namely, two different peptides binding to the TAZ1 domain,30 and five different peptides binding to 14-3-3ζ.48 With regard to the former, we likewise know of just three published examples: namely, a short segment from HIF1α bound to two partners, the TAZ1 domain and the asparagine hydroxylase FIH protein,30 a short segment from the C-terminus of p53 bound to four partners, S100ββ, sirtuin, CREB binding protein, and cyclin A2,48 and a larger collection of various short segments bound to multiple partners.52

We have carried out data mining on the Protein Data Bank (PDB) to find additional examples of both one-to-many and many-to-one complexes at atomic resolution. While both datasets are assembled, our focus herein is on the collected examples of one-to-many interactions. Our work on the many-to-one examples is in progress and will be published at a later date.

We have found well over 300 sets that contain segments having the same sequence bound to two or more partners, but here we are focusing on unambiguously the same protein bound to highly divergent partners (e.g., partner pairs with less than 25% sequence identity), thus reducing the numbers down to 23 sets of segments that bind to 2–9 partners. The goal is to provide detailed analyses of the conformational changes enabling the same disordered segment to bind to more than one protein partner. Overall these data support the view that the flexibility of disordered regions is a significant factor in the ability of IDPs to bind to two or more partners. As we assembled this dataset, we also found that ASEs and PTMs were also involved in the process of enabling one disordered region to bind to more than one protein partner. These latter findings suggest that interplay of multiple factors has participated in the evolution of complex PPI networks and might be important in the development of tissue- specific signaling networks.

Results

Summary of our MoRF dataset involving one-to-many binding

We identified 4289 MoRFs from PDB based on their sequence length (5–25 residues). Of these, 452 complexes with small surface areas of interaction were eliminated due to uncertainty regarding the biological significance of the interactions. An additional 689 complexes were excluded because their partners were nonglobular.

To identify overlapping MoRFs, MoRF sequences were mapped back to their parent sequences. A short segment will give exact matches to many unrelated sequences. Because many of the MoRFs are short, only 1805 of the remaining 3148 MoRFs could be unambiguously mapped in an automated fashion to their parent sequences in the Universal Protein Resource (UniProt) database. Based on the overlapping regions in parent sequence mapping (at least one residue), 298 MoRF sets with multiple partnerships were obtained. Structurally redundant partners were discarded from our final dataset based on imposing an upper bound of 25% pairwise sequence identity for every pair of partners.

Finally, 23 MoRF clusters with 61 partners were confirmed by manual inspection to further insure that short peptides were bound to globular partners. Thus, for the dataset investigated herein, each MoRF associates with 2–3 distinct partners on average. A summary of the development of the dataset is given in Table I. The 23 MoRF examples are listed in Table II. The previous two partnerships involving HIF1α were not found in this study because the length of the peptide, 51 amino acids, exceeded the upper bound of 25 residues used in this study. On the other hand, note that the previously described four partnerships involving the carboxy terminal tail of p53 were all found in our dataset,53 showing that our overall strategy found a previously known example the length of which was between the upper and lower thresholds used herein.

Table I. Description of MoRF Dataset
Data setMoRFsClustersMoRFs per cluster
  • a

    MoRFs with 5–25 residues are the focus of this study.

  • b

    400 Å2 cutoff was set to filter out the spurious interactions caused by crystal contacts.

  • c

    Binding partners of MoRF are supposed to be globular proteins having more than 70 residues to fold into a certain conformation. The excluded ones includes interactions between short domain like SH3, chromo domain, A/B chain of insulin, Gramicidin-form ion channels, peptides forming amyloid-like fibril, alpha-helical coiled coil, and de novo proteins.

  • d

    Most MoRFs cannot be mapped to UniProt are 5–9 residues in length.

  • e

    MoRFs having one or more overlapping residues with each other.

  • f

    Atypical cases include, for example, one MoRF bound to more than one partner in the same PDB entry and partners with subsequences that exactly match the entire sequence of another partner.

Initial MoRF dataset (5–25)a4289  
MoRF dataset with biological interaction (>400 Å2)b3837  
MoRF dataset with globular partner (>70)c3148  
MoRFs mapped to UniProt sequence databased1805  
MoRFs with overlapped region in mappinge14932985.01
MoRFs without 100% sequence identity in partners248872.85
MoRFs without 25% sequence identity in partners214772.78
MoRFs without atypical casesf61232.65
Table II. Twenty-Three Examples of MoRFs and Their Secondary Structures
MoRF examplesNBound conformationPartnersMoRFs
HelixSheetCoilComplexRMSDCoveragePTMAS
  • N, numbers of MoRFs in the set; PTM, post-translation modification; AS, alternative splicing; TS, tissue-specific alternative splicing; —, MoRFs from other species (not from human or mouse).

  • a

    MoRFs from human.

  • b

    Although most residues within the two partners can be roughly aligned together, their individual structure varies a lot.

  • c

    MoRFs from mouse.

  • d

    Within these two sets, the coverage of alignments is low because one partner is a sub-domain of the other partner but with low sequence identity.

8 MoRFs with differently folded partners26      111
 1. Histone H3—N-terminal I900907.070.215
 2. p53—near C-terminal410216.800.391Na
 3. CTD of RNA polymerase II300308.350.263
 4. Angiotensin200207.740.270
 5. HIV envelope glycoprotein200204.160.410
 6. Histone H3—N-terminal II200208.250.222
 7. Vasopressin200208.690.370
 8. p53—near N-terminal220006.180.62b0Ya
15 MoRFs with similarly folded partners35      24
 9. Nuclear receptor coactivator 1 and 2520213.940.920
 10. Nuclear receptor corepressor 2320103.430.850TSa
 11. Thyroid receptor associated protein 220330003.050.910Yc
 12. Nuclear receptor coactivator 1220005.490.850
 13. BAK peptide220005.500.730Na
 14. Nuclear receptor 0B2—near N-terminal220003.740.860Na
 15. Troponin I, cardiac muscles200113.010.790
 16. Nuclear receptor 0B2—near C-terminal210103.880.800Na
 17. Cell death protein GRIM202002.330.790
 18. Beclin-1220004.100.840
 19. Histone H4200203.930.50d0
 20. Bcl-2-like protein 11 (Bim)220002.720.900Ya
 21. Amyloid beta A4 protein200202.930.840Ya
 22. Rhodopsin220004.250.860
 23. DNA repair protein RAD9200203.530.36d2

Most sets contain one MoRF interacting individually with two partners, but six of the sets have more than two partners. These are the N-terminus of histone H3, nuclear receptor coactivator 1 and 2, the C-terminus of p53, the NR corepressor 2, the thyroid receptor associated protein 220, and the carboxyl-terminal domain (CTD) of RNA polymerase II (RNAP II). Because MoRFs in the NR coactivator 1 and 2 share similar sequences and can be mapped to the same parent sequence, our method clustered them together as a single set. Most clusters have MoRFs with similar secondary structures in different complexes. Only five of them exhibit a mixture of different secondary structures (Table III).

Table III. The Combination of Secondary Structure Types in the 23 MoRFs
Secondary structureClustersSimilarly folded partnersDifferently folded partners
α + β + ι + Complex000
α + β + ι000
α + β + Complex000
α + ι + Complex211
β + ι + Complex000
α + β000
α + ι220
α + Complex000
β + ι000
β + Complex000
ι + Complex110
α871
β110
ι936
Complex000

The goal here was to find the same MoRF sequence bound to structurally distinct partners, so partners having low sequence identity were selected. A sequence identity of 25% was chosen as the upper bound because proteins with sequence identities higher than this value are almost always similar in structure.54 Nevertheless, even though the partners of each MoRF set were selected to have low sequence identity, several partner conformations turned out to exhibit structural similarity. Based on the structure alignment of their partners, the 23 MoRF sets can roughly be grouped into 15 MoRFs with similarly folded partners (with ∼19% sequence identity on average) and 8 MoRFs with differently folded partners (with ∼10% sequence identity on average). Notice that MoRFs with differently folded partners apparently prefer to form irregular secondary structure upon binding, whereas MoRFs with similarly folded but sequence diverse partners tend to prefer to form helix or sheet (Table III).

Two predictors, ANCHOR49 and MoRFpred,55 have been developed to predict partner binding sites within longer regions of disorder. Application of these predictors to the MoRF-containing sequences herein shows that, while both predictors typically indicate binding sites corresponding to the observed MoRFs, neither predictor is particularly accurate with respect to the locations of the binding sites (data not shown). Interestingly, the locations of the MoRFs with similarly folded partners are predicted with slightly greater accuracy by both predictors as compared with the locations of MoRFs that bind to differently folded partners.

15 MoRF sets with partner pairs exhibiting similar folds

Among the 15 MoRFs with partners having similar folds, similar binding profiles and common interacting residues were observed. Partner pairs within 11 of these MoRFs have both a relatively low RMSD and a relatively good structural alignment. The mean sequence identity for structurally aligned binding and nonbinding residues are 42 ± 6% and 20 ± 3%, respectively, within these 11 sets. Binding residues, which are usually on the surface, have about 2.5-fold higher sequence identity than nonbinding surface residues, indicating that these interactions are likely to be biologically significant. For the same MoRF bound to structurally similar partners, only slight conformational changes of MoRF side chains were observed, whereas the backbone conformations of the same MoRF between various complexes are relatively uniform.

Figures 1 and 2 show two examples for which the flexibility needed to accommodate different partner surface features is manifested as side chain rotations. Lysine in nuclear receptor corepressor 2 has different conformations to stretch into the opposite cleft in three complexes to form the associations between the three receptors (Fig. 1). Histidine and arginine in nuclear receptor coactivator 1 (NCOA1) and 2 (NCOA2) also act in a similar way in Figure 2. Here, the two different proteins NCOA 1 and 2 are grouped into one cluster in our dataset because both of them have similar conserved binding sequences containing LxxLL motifs (“HKILHRLLQD” and “HKILHRLLQE”) like other NR-boxes.56 The side chain conformations of the three leucine residues stay nearly the same except for the ones that interact with the androgen receptor.

Figure 1.

MoRFs in nuclear receptor corepressor 2 bind to three different but structurally similar nuclear receptors. They are (A) estrogen-related receptor gamma (with α-MoRF in 2GPV), (B) progesterone receptor (with α-MoRF in 2OVH), and (C) peroxisome proliferator activated receptor (with ι-MoRF in 1KKQ). (D) In the superimposition of the three complexes, the uncharged residues (in bold) in the core MoRF region maintain relatively stable conformation. An interactive view is available in the electronic version of the article.PRO2207 Figure 1

Figure 2.

The diagram shows a variety of interactions of MoRFs with highly similar sequences in nuclear receptor coactivator 1 and nuclear receptor coactivator 2. (A) ι-MoRF in nuclear receptor coactivator 2 interacts with androgen receptor (1T65). (B) α-MoRF in glucocorticoid receptor-interacting protein 1 (alternative name of NCOA2) interacts with estrogen receptor (1L2I). (C) complex-MoRF in Nuclear receptor coactivator 1 isoform 1 interacts with orphan nuclear receptor NR1I3 (1XV9). (D) α-MoRF in nuclear receptor coactivator 1 interacts with bile acid receptor (2O9I). (E) ι-MoRF in nuclear receptor coactivator 1 isoform 3 interacts with orphan nuclear receptor pregnane X receptor (3BEJ). (F) The three leucine residues (in bold) of the LxxLL motif are superimposed well in the five complexes. An interactive view is available in the electronic version of the article. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] PRO2207 Figure 2

This example demonstrates that the same protein can be involved in both one-to-many and also many-to-one binding, thus raising the level of network complexity and leading to multiprotein regulatory complexes that can respond to environmental signals. Comparing our one-to-many dataset described herein with our many-to-one dataset (manuscript in preparation) reveals that, of the 23 examples in Table II, there are 12 cases of proteins involved in both one-to-many and many-to-one binding. That is, 12 of the MoRFs in Table II bind to a structured partner that also binds to additional MoRFs having different sequences. Because our identification of one-to-many and many-to-one examples did not involve any steps for identifying MoRFs involved in both mechanisms, we find this number of 12 of 23 involved in both mechanisms to be quite high and to suggest that such dual use of both mechanisms is likely to be a very common feature of PPI networks.

In summary, NCOA binding molecules include many kinds of nuclear receptors, including androgen receptor, estrogen receptor, nuclear receptor subfamily 1, group I member 3 (NR1I3/CAR), bile acid receptor, and pregnane X receptor.57–61 Other detailed investigation into the MoRFs with similar-fold partners was performed and discussed in our previous work.52

Conformational changes of MoRFs with differently folded partners in various interaction complexes

Eight MoRFs in our dataset converted into significantly different conformations to fit onto the surfaces of structurally different molecular partners. For these examples, only a small portion of their partners' residues can be structurally aligned. We selected the three examples with the largest number of partnerships (p53, RNAP II, and histone H3) to illustrate the variable buried surface area of each MoRF residue upon diverse binding (Fig. 3).

Figure 3.

The profiles of solvent surface area changes within three selected MoRF clusters with structurally different partners: (A) p53, (B) RNAP II, and (C) H3. The Y axis gives the change in surface area of each entire residue upon binding, whereas the X axis gives the residues. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Charged residues (R, H, and K), aromatic residues (F and Y), and phosphorylation-related residues (S, T, Y, H, R, and K) in MoRF regions vary substantially in their contributions to binding different partners. In contrast, proline contributions to the different interfaces involving RNAP II remain relatively stable. Unlike MoRFs with similarly folded partners, which generally use their various residues in quite similar ways to associate with relatively conserved interacting residues, each partnership within this set uses conformationally distinct MoRFs and different residues or the same residues with different degrees of burial in their associations with their very distinct partners. That is, the same MoRFs show large variability in their side chain burial and exposure and even shifts in the binding region when binding to structurally divergent partners.

In addition to differential side chain burial and rotations, PTMs are also observed to be associated with the conformational alterations that are observed when the same MoRF binds to different partners, especially for those MoRFs that bind to structurally distinct partners. That is, of the 26 complexes involving differently folded partners, 11 have posttranslationally modified residues. On the other hand, for the MoRFs with similarly folded partners, just 2 of the 35 complexes contain PTMs.

The C-terminus of p53 illustrates the conformational changes of a single MoRF within different partnerships. It was observed to transform either into a complex MoRF, an ι-MoRF (irregular MoRF), or an α-MoRF (helix), in four different structures in our dataset (Fig. 4). The complex MoRF is composed of three residues of β-strand and three residues of coil and was classified as a β-MoRF in our previous work.53 This change from the previous work arose because here we use automated secondary structure assignment (DSSP), whereas the previous work used the crystallographer's assignment of secondary structure.

Figure 4.

Four different biological molecules interact with C-terminus of p53. (A) Sirtuin: an NAD-dependent deacetylase (with complex-MoRF), 2H59, (B) cyclin A2 (with ι-MoRF), 1H26, (C) CREB binding protein (with ι-MoRF), 1JSP, and (D) S100 calcium-binding protein (α-MoRF), 1DT7. In the sequence alignments, a residue having a posttranslational modification in PDB is indicated in red. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Examination of the MoRFs in RNAP II and Histone H3

Although the two other MoRFs, RNAP II, and H3, with distinctly folded partners have coiled structures for all of their three and nine complexes, respectively, the backbone conformations differ markedly between any two pairs of structure.

Phosphorylation of specific serines (red letters in Fig. 5 in online version) in the carboxyl-terminal domain (CTD) of RNAP II affects not only partner binding to the MoRF but also provides important regulation of transcriptional activity. The CTD in RNAP II is composed of up to 52 heptapeptide repeats (YSPTSPS), which are important for polymerase activity.62 Efficient capping, splicing, and polyadenylation of mRNAs all require the CTD portion of RNAP II. For example, the CTD small phosphatase 1 (CTDSP1) catalyzes the dephosphorylation of Ser5 within the tandem seven residues repeats, causing the initiation of RNAP II transcription [Fig. 5(A)].63 The Ser2-phosphorylated CTD binds to a CTD-interacting domain (CID) in protein1 of cleavage and polyadenylation factor I (PCF11), which is essential for transcription elongation 3′ and RNA processing [Fig. 5(B)].64 The mRNA capping enzyme (mRNA CE) is recruited to the transcription complex, catalyzing its reaction through the binding of the phosphorylated Ser5 in CTD of RNAP II [Fig. 5(C)].65 The capping modification is helpful in the recognition and attachment of mRNA to the ribosome as well as protection from exonucleases.

Figure 5.

The MoRF mechanism plays a role in mediating interactions involving the CTD of RNA polymerase II. (A) CTD small phosphatase 1 (with ι-MoRF), 2GHQ, (B) protein 1 of cleavage and polyadenylation factor I (with ι-MoRF), 1SZA, (C) mRNA capping enzyme alpha subunit (with ι-MoRF), 1P16, and (D) similar bends near Pro 1700 occurs in all three bound MoRFs. In the sequence alignments, residues in red indicate residues with PTMs in PDB. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

The three bound MoRFs in RNAP II all seem to exhibit a bend at a similar location. To gain greater insight, these three MoRFs were structurally aligned [Fig. 5(D)]. Two of the MoRFs (bound to PCF11 and mRNA CE-α) show very similar backbone traces with bends at Pro 1700. The third MoRF (bound to CTDSP1) also shows a bend near Pro 1700, but the backbone trace and location of the bend relative to Pro 1700 are different from the other two examples [Fig. 5(D)].

Because these sequences typically contain just one MoRF binding site for multiple partners, this raises the possibility that partner competition for the single site could be an important regulatory feature these binding interactions. In contrast, for the CTD of RNAP II, the MoRF sequence is repeated more than 50 times. These MoRFs may adapt different structures as they bind to alternative partners. The interplay between partner competition and repeated binding sites may provide a mechanism for subtle and tunable regulation of MoRF/partner interactions.

The MoRF in Histone H3, which contains the maximal number of partners in our dataset, interacts with nine structurally different partners using residues from 2 to 22 in the sequence (Fig. 6). Even though all nine MoRFs are classified as coiled structures, some residues within the MoRF region form helical or strand-like structures upon binding to the different partner proteins. Among the nine binding partners of the N-terminal tail of histone H3, there are several enzymes that are implicated in PTMs. This N-terminal tail that protrudes from the globular nucleosome core can undergo several different types of epigenetic modifications that influence cellular processes. These modifications include the covalent attachment of methyl or acetyl groups to lysine and arginine amino acids and the phosphorylation of serine or threonine. Some of these modifications are included in our data set and characterized in Figure 6 (with the modified residues marked in red).

Figure 6.

Nine different binding partners of ι-MoRFs in the N-terminus of histone H3. Its partners include (A) Jumonji domain-containing protein 2A, 2GFA, (B) DNA-methyltransferase 3-like, 2PVC, (C) WD-repeat protein 5, 2H6K, (D) VDJ recombination-activating protein 2, 2V83, (E) lysine-specific demethylase 1, 2V1D, (F) histone acetyltransferase (HAT A1), 1PU9, (G) 14-3-3 protein zeta/delta, 2C1J, (H) Jmjc domain-containing histone demethylation protein 3A, 2Q8C, and (I) histone H3 methyltransferase DIM-5, 1PEG. (J) Schematic diagram of histone H3 protein shows its predicted and validated disordered tails and a central folded domain. Structural data and various disordered binding site predictors reveal the potential binding regions of H3 are highly associated with posttranslationally modified sites. The residues in red in (A–I) are PTM sites in PDB, and the methionine in gray is a residue that was mutated for the structural study. The annotated PTM sites on the entire H3 in J is from UniProt. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

The double Tudor domain of JMJD2A, a Jmjc domain-containing histone demethylase, binds methylated Lys 5 on Histone H3. This complex functions as a transcription repressor [Fig. 6(A)].66 The DNA-methyltransferase 3-like (DNMT3L) protein recognizes the histone H3 tails with unmethylated Lys5 and stimulates de novo DNA methylation by engaging the DNMT3A2 molecule [Fig. 6(B)].67 The WD-repeat protein 5 (WDR5) is a core component of SET1-family complexes that achieve transcriptional activation via methylation of histone H3 on Lys 5 [Fig. 6(C)].68 The recombination activating gene 2 contains a plant homeodomain that recognizes histone H3 methylated at Lys 5 and influences V(D)J recombination [Fig. 6(D)].69 Histone demethylase LSD1 regulates transcription by demethylating Lys5 of histone H3 [Fig. 6(E)].70 A substrate-like peptide was generated by a K5M mutation (marked in gray in Fig. 6) because this mutation led to 30-fold increase in binding affinity thereby helping to stabilize the complex. Phosphorylation at Ser 11 of histone H3 enhances GCN5 histone acetyltransferase mediated Lys 15 acetylation, promoting transcription [Fig. 6(F)].71 The 14-3-3 isoforms present a class of proteins that mediate the effect of Ser 11 phosphorylated histone H3 [Fig. 6(G)].72 The jumonji domain of JHDM3A (JMJD2A) catalyzes the demethylation of di- and tri-methylated Lys10 and Lys 37 in histone H3 [Fig. 6(H)].73 DIM-5 is a histone H3 Lys 9 methyltransferase, that is, essential for DNA methylation [Fig. 6(I)].74

Figure 6(J) summarizes the results of disorder/order predictions, potential interacting regions, and annotated PTM sites in UniProt in human histone H3. In general, H3 has a central structural region (residue 58–132) that matches to a Pfam family (histone: core histone H2A/H2B/H3/H4) and a long N-term disordered tail (around 38–48 residues in length). A similar disorder/order estimate was given by PONDR VSL2B. Within current 294 PDB entries related to human histone H3 (27-Mar-12), 40 complexes were found to include H3 fragments (MoRFs) between residue 2 and 34. This N-terminal binding region was not recognized by both MoRF1 and MoRF2 predictors,47, 75 but we claim the reasons may be because these two predictors were built specifically for helix MoRFs, not coil MoRFs like the ones in H3. Figure 6(A–I) shows the nine MoRFs found in the same region are all coil MoRFs. Part of the binding region can be predicted by ANCHOR,49 whereas the entire region can be found by MoRFpred50 method. Based on the sequence annotations of UniProt database, most PTM sites of H3 are located in the N-terminus of H3, implying the functionally regulation sites may highly tie with MoRFs within disordered regions.

Discussion

Our 23 MoRF examples of one-to-many binding comprise a special set, containing partners with little sequence similarity that bind to MoRFs with identical sequences. This approach is distinct from the concept of structural compensation or coadaptation, for which mutations on one partner are linked to compensating mutations on the partner.76 It would certainly be possible to lift the requirement of MoRF sequence identity to thereby study coadaptation in complexes involving disordered proteins. Indeed, we have work in progress along these lines for a few specific examples to determine whether coadaptation between two structured proteins is different from coadaptation between structured proteins and MoRFs.

There have been several previous bioinformatics investigations of large numbers of IDP-involving PPIs at a high level, without paying attention to the structural details.47, 75, 77, 78 Instead, our approach here is to investigate fewer MoRF examples, but in greater detail in order to develop a deeper understanding of how IDPs can alter their conformations so as to be able to bind to structurally distinct partners. Our observations demonstrated that, in general, conformation flexibility allows for both subtle and complex structural variation, thereby enabling the same sequence to transform onto the diverse and distinctively shaped binding sites provided by their partners.

The MoRFs collected and grouped into one cluster herein are typically gathered from different organisms. As suggested by others, through parallel or convergent evolution, such MoRFs can exist as conserved functional motifs or regions among various species, such as human, mouse, yeast, E. coli, or even viruses.79

As pointed out previously,77 such short linear motifs are amenable to convergent evolution due to the limited number of mutations that are necessary for the generation of a useful motif. In fact, motifs are commonly used as adding new functional modules within a proteome, especially in higher eukaryotes.80 These short functional linear motifs are hypothesized to have higher levels of conservation, to frequently evolve convergently, to preferentially occur in disordered regions and to often form a specific secondary structure when bound to interaction partners.79 This observation fits in with the conception that alternative inclusion of exons in different tissues provides functional diversity of proteins. In fact, embedded conserved binding motifs and PTM sites are both rich in tissue-dependent protein segments.81 The tissue-dependent spliced regions have higher percentage of protein disorder that likely form conserved interaction surface and participate significantly more protein interactions.82

Among the 23 MoRFs in our dataset, three MoRFs (TRAP220, Bim, and amyloid A4 protein) were annotated in UniProt to be located in alternatively spliced regions. Alternative splicing has the potential to add or delete an entire MoRF region. In addition, MoRF-related functions could be modulated by alternative splicing by changing the expression patterns, localization and regulation. These complex mechanisms could lead to broad functional and regulatory diversity. For example, pro-apoptosis protein Bim has 17 isoforms. Its predominant three isoforms, BimEL, BimL and BimS, all have the MoRF region (BH3 ligand) “DMRPEIWIAQELRRIGDEFNAYYAR,” which is responsible for binding selectivity for their pro-survival protein binding targets and starting Bcl-2 regulated apoptosis. Those Bim isoforms lacking the BH3 ligand, for example, Bimβ1-7, also lack pro-apoptotic activities.

Two additional MoRFs were reported to have ASEs based on studies of the tissue-specific splicing exon data set.81 A MoRF region from nuclear receptor corepressor 2 is specifically expressed in only 1 of 14 tissue types. As was pointed out,81 the tissue-specific alternative splicing that leads to presence and absence of binding sites in disordered protein regions leads to the “rewiring” of PPI networks and may, therefore, contribute fundamentally to tissue development. It would be very interesting to develop models for the alterations in PPI networks in different tissues that arise from alternative splicing, but unfortunately the partners for the tissue-specific MoRFs are simply not known.

In a previous study, we found that alternatively spliced regions of RNA code for protein disorder much more often than for regions of structure, and we showed that such alternative splicing could lead to inclusion or exclusion of binding sites within the disordered regions.83 Interestingly, of the human MoRFs studied here, 50% (4 of 8) are in exon regions that have been identified as included or excluded by alternative splicing. The discussion in the previous paragraph suggests that a concerted effort should be made to identify additional MoRFs that map to tissue-specific alternatively spliced regions and to identify their partners as well.

In our previous study of the carboxy terminal tail of p53 bound to four different partners, we noticed that two of the complexes were distinguished by having PTMs, namely lysine acetylations for both examples. Furthermore, the acetate groups both became buried in the interfaces between the two MoRFs and their respective partners.48 In this study, we discovered that differences in PTMs occur commonly when MoRFs bind to alternative partners. Furthermore, this use of modified side chains to bind to one of two partners is most common when the two partners are structurally distinct. Indeed in this study, of 13 MoRFs containing PTMs, 11 involve MoRFs that bind to differently folded partners, thus providing additional observations in support of this concept. Finally, the chemical group added via the modification is typically found buried or partially buried in the interface between the MoRF and its partner, which strongly suggests that PTM provides an important part of the signal for the MoRF to bind to an alternative partner.

Phosphorylation occurs much more often in intrinsically disordered as compared with structured regions of proteins.84, 85 Recently, several other types of PTM have been shown to prefer disorder over structure.86 The results presented herein suggest that such a modification can be used to change the partner preference of a given MoRF, thus leading to switching the connections of a PPI network.

Our results contribute to a better understanding of the role of disorder binding regions (MoRFs) that may serve as protein interaction hubs. Exploring the diverse binding partners of our collected MoRF sets and the corresponding complex conformations definitely give us a general Rosetta stone to interpret the underlying biological mechanisms and evolutional aptness. The importance and indispensability of hub proteins is apparent as they appear to evolve more slowly and are more likely to be vital for survival. Given their importance, many human disease-associated proteins related to cancer, diabetes, autoimmune disease, neurodegenerative disease, and cardiovascular disease are found to have predicted disordered binding regions (MoRFs) as we expect.87 These MoRFs associate with other structured partners and considered as promising druggable interactions because of their high specificity and low affinity for binding. Binding with relatively low affinity is an advantageous attribute for transient, conditional, and tunable interactions, which is needed for many regulatory events. Therefore, this study will help to pave the way for the development of novel pathways by designing intervening disordered peptides having binding sites for particular partners but with tighter interactions.

Materials and Methods

MoRF data sets

Our disordered hub dataset was extracted from PDB by analyzing the complex structures that have short nonglobular protein fragments bound to large globular structured partners. In this article, we concentrated on those MoRFs which are short nonglobular protein fragments whose visible residues in crystallographic electron density maps included between 5 and 25 residues and binding partners are globular proteins greater than 70 amino acids in length. The PDB entries we used were released on March 28, 2008.

An interface size (ΔASA) of 400 Å2 was used to discriminate biologically relevant interactions and nonbiological interactions caused by crystal packing contacts in this study.88 The same cutoff was previously chosen by the authors of the protein quaternary structure file server, because the minimal ΔASA of homodimers and heterodimer are about 370 Å2 and 640 Å2, respectively.89

Characterization of MoRF clusters that perform one-to-many binding like p53

To discover specific disordered regions binding to multiple structured partners like p53, we used a FASTA program to align each MoRF sequence to the UniProt sequence database. This database encompasses the UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases. The e-value was set at 1000 while carrying out the similarity search. Following that, we only kept those MoRFs which had overlapping regions (circled ones in Fig. 7) in their parent sequence mapping and used a cluster algorithm (wherein at least one residue overlapped with the rest of the MoRFs in the same cluster).

Figure 7.

A schematic diagram to show how we constructed our disordered hub dataset by aligning and clustering MoRF sequences from complex structures. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Removal of redundant MoRFs in MoRF clusters based on sequence identity

As our research is focused upon those MoRFs from the same disordered region which bind to structurally different partners, we used the BLASTCLUST program to remove any redundant structured partners in our dataset based on 100% and 25% sequence identity. That means that those specific MoRFs are in one disordered region, but they use distinct residues to form bonding with different structured partners.

Removal of atypical MoRFs in MoRF clusters

After examination of the entire MoRF dataset manually, we found there were several unanticipated cases that were not consistent and needed to be removed from our dataset. They include the cases involving one MoRF interacting with more than one partner in a single PDB entry or a partner molecule which may be a subset of another partner in the same cluster.

Secondary structure assignment of MoRF

We classified MoRFs into four different types (α, β, ι, and complex) based on their secondary structure type, which has the largest percentage value of the four types mentioned above. If there is no clear preponderance of any one secondary type (which is at least 1% greater than the other two types), we classified it as a complex-MoRF. Only the residues on the interface were counted. DSSP was used as the secondary structure assignment program here.

Acknowledgements

The authors are very grateful to Dr. Thomas D. Hurley and Dr. Yaoqi Zhou for providing helpful suggestions and discussions and Dr. M. Madan Babu and Ms. Marija Buljan for helping with the use of their tissue-specific alternative splicing dataset.

Ancillary