Integration of protein motions with molecular networks reveals different mechanisms for permanent and transient interactions

Authors

  • Nitin Bhardwaj,

    1. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
    Search for more papers by this author
  • Alexej Abyzov,

    1. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
    Search for more papers by this author
  • Declan Clarke,

    1. Department of Chemistry, Yale University, New Haven, Connecticut 06520
    Search for more papers by this author
  • Chong Shou,

    1. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
    Search for more papers by this author
  • Mark B. Gerstein

    Corresponding author
    1. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
    2. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
    3. Department of Computer Science, Yale University, New Haven, Connecticut 06520
    • Program in Computational Biology and Bioinformatics, Yale University, Bass 426, 266 Whitney Avenue, New Haven, CT 06520
    Search for more papers by this author

Abstract

The integration of molecular networks with other types of data, such as changing levels of gene expression or protein-structural features, can provide richer information about interactions than the simple node-and-edge representations commonly used in the network community. For example, the mapping of 3D-structural data onto networks enables classification of proteins into singlish- or multi-interface hubs (depending on whether they have >2 interfaces). Similarly, interactions can be classified as permanent or transient, depending on whether their interface is used by only one or by multiple partners. Here, we incorporate an additional dimension into molecular networks: dynamic conformational changes. We parse the entire PDB structural databank for alternate conformations of proteins and map these onto the protein interaction network, to compile a first version of the Dynamic Structural Interaction Network (DynaSIN). We make this network available as a readily downloadable resource file, and we then use it to address a variety of downstream questions. In particular, we show that multi-interface hubs display a greater degree of conformational change than do singlish-interface ones; thus, they show more plasticity which perhaps enables them to utilize more interfaces for interactions. We also find that transient associations involve smaller conformational changes than permanent ones. Although this may appear counterintuitive, it is understandable in the following framework: as proteins involved in transient interactions shuttle between interchangeable associations, they interact with domains that are similar to each other and so do not require drastic structural changes for their activity. We provide evidence for this hypothesis through showing that interfaces involved in transient interactions bind fewer classes of domains than those in a control set.

Introduction

Protein–protein interaction networks are conventionally delineated and studied as undirected graphs, in which nodes denote proteins and edges represent interactions. Though such representations have been invaluable as a means of learning about the basic underlying structure of networks in a global sense, the incorporation of 3D structural data into such networks is needed in order to gain deeper insights into the basic biological functionality underlying networks and their constituent proteins.1–5 Most of the work in structural network biology has focused on the prediction of new interactions6–8 and affinities.5 Kim et al. used 3D protein structures to construct the structural interaction network (SIN).3 Structural exclusion enabled them to discriminate between overlapping and non-overlapping sites. If a given protein interacts with multiple partners that use distinct regions of that protein's surface, they can all bind simultaneously. The alternative, in which the partners employ a common interface on the given protein, gives rise to a mutually exclusive set of potential interactions. They also found that proteins associated through such simultaneously possible interactions tend more to have similar biological functions and expression patterns. Mutually exclusive interactions are generally ‘transient’, whereas those which are simultaneously possible are more likely to be ‘permanent’. The hubs in the SIN (>4 interaction partners) were classified as singlish-interface (those with one or two interfaces) and multi-interface hubs (those with more than two interfaces). Statistically significant differences were obtained for these two classes of hubs in terms of essentiality and coexpression.

In addition to structural data, other kinds of data have also been integrated with interaction networks. Data on gene expression have previously been integrated with interaction networks in yeast to gain insights into functional relationships between network proteins, as well as protein complexes, in the context of progression through the cell cycle.9 In addition, expression data have been combined with information on the position of network proteins to show that proteins exhibiting constitutive expression largely constitute static network modules, whereas those exhibiting more complex coregulation patterns tend to constitute dynamic network modules.10 Tuncbag et al. have described node-and-edge representations in the context of time to better represent the dynamic nature of the processes operating within biological networks.11 These studies have focused on the dynamics of the protein–protein binding process and show how integrating multiple layers and types of data can reveal much more than what is provided by a simple static picture. However, the dynamics of protein conformational changes in the context of interaction networks have not been explored on a large scale. There have been some articles on allosteric effects in proteins,12, 13 correlation of experimental structural changes with model predictions,14 and different mechanisms for small- and big-molecule interactions.15 Nevertheless, there has been no study examining the protein structural modifications involved in different classes of protein–protein interactions (such as permanent and transient interactions).

The size and shape of a protein create additional constraints with respect to its ability to physically interact with other proteins. Not surprisingly, interactions are typically accompanied by conformational changes, many of which propagate throughout the protein, away from the interface.16 The “lock-and-key” model was one of the earliest descriptions provided for conformational changes.17 Although this model insightfully emphasizes the importance of shape complementarity between the two structures, the proteins in their bound form exhibit structural changes with respect to their unbound form. The “induced fit” model18 on the other hand, considers proteins as more structurally dynamic entities, and therefore provides a more realistic description, in which conformational changes accompany and are even required for interactions. A geometric fit is thus ensured only after the structural rearrangements are induced as part of their interaction. More generally, although many of these interactions require certain modifications of the interface, others induce changes in the binding protein which facilitate the interaction.

Here, we integrate protein dynamics with different classes of proteins (such as non-hubs, singlish-interface hubs, and multi-interface hubs) and the interactions in which they are involved. We parse the entire PDB to list alternate protein conformations in human and yeast and then map these structural dynamics on top of the SIN to create Dynamic Structural Interaction Network (DynaSIN, Fig. 1). This allows us to analyze interaction networks in which nodes constitute more than static dots and have a structural relevance with a particular shape, size, and plasticity, which enables them to interact with their partners. We superimpose the conformational changes shown by different classes of proteins (Fig. 1) and show that the degree of conformational changes differs between the classes. Further, by careful alignment of a protein's alternate conformations, we identify the motions that are either compatible or conflicting with the interaction. We also demonstrate that the extent of conformational change involved also varies for different classes of interactions (permanent and transient). Finally, based on our results, we propose a new picture of the different ways in which transient and permanent interactions proceed. It should be noted here that the term “permanent” does not indicate that the relevant protein interacts with its partner in a strictly permanent fashion (i.e., it does not remain bound to the partner for the duration of its life time). This term (along with “transient” interaction) is based on the convention previously adopted by Kim et al.3

Figure 1.

Flowchart for generating DynaSIN. We start with the interaction network. The information about the interfaces from the PDB is then mapped onto the network. This enables classification of edges as permanent (those associated with a unique interface, dark blue solid edges) and transient interactions (those which share an interface, light blue dotted edges). Nodes are also classified into singlish-interface (those with one or two interfaces, light blue circles) and multi-interface (those with more than two interfaces, dark blue squares) proteins. This structural annotation of nodes and edges gives us the structural interaction network. Next, all the alternate conformations of the proteins, whenever available, are aligned with the structure in the complex. The nodes that adopt alternate conformations are shown in yellow. Because of these alternate conformations, some interactions are likely to be affected by the conflicting motions (shown by solid red edges), whereas compatible motions do not affect the associated interactions (shown by dotted green edges).

Results

Creation of DynaSIN v1.0

As part of our study here, we expanded SIN v1.0, created earlier by Kim et al.,3 to not only include more mappings for yeast but also include human and E. coli in the analysis, thus creating a high-confidence SIN v2.0 (Table I). To do so, we first filtered a high-confidence interaction set for human from BioGrid (2.0.44) by including only those interactions that were at least reported in vivo and then removed the redundant ones. For consistency, we then used the same strategy for yeast. The numbers of singlish- and multi-interface hubs, for both human and yeast, are given in Table I, along with the number of transient and permanent edges. As the integration of networks with protein motions can be a complex task (these data are at different levels of biological organization, and the mapping is not one-to-one), we devised the following protocol (as explained in detail under Materials and Methods section). For each protein, we obtained all of its occurrences from the PDB using their UniProt IDs. UniProt ID annotation was chosen as it uniquely maps each chain in the PDB to a single protein. Next, for a given structure, we aligned its alternate conformation with the conformation in the complex to identify the extent of conformational changes at the interfaces. Lastly, we identified the motions in an interface-centric manner using rigid blocks overlapping the interfacial region (heuristic alignment techniques that aim to minimize the RMSD between two structures may not capture the true changes in certain cases; see Fig. 2 and Materials and Methods section for details). This alignment allowed us to classify each motion as “conflicting” (if the interfacial change is disruptive to the interaction) or “compatible” (if it does not disrupt the interaction). A schematic and a real example of compatible and conflicting motions are provided in the Supporting Information (Supporting Information Figs. S1 and S2).

Figure 2.

Structure alignment using heuristic and rigid block-based techniques, wherein there is relative movement between two blocks in alternate conformation. (A) One of the proteins in a complex (in red) has an alternate conformation (in blue). In heuristic techniques, where the aim is to minimize the overall RMSD, the alignment might give an intermediate overlap between the two blocks or a complete overlap of the bigger block. (B) An alignment scheme based on the interfacial block (the rigid block that contains the highest overlap with the interface) will first superimpose that block between two conformations. This ensures that only the changes within the interfacial region are captured.

Table I. Size of the Dataset Used in This Study
 HumanYeast
# of edges in SIN 2.0Transient67281,172
Permanent1780356
# of nodes in SIN 2.0Non-hub1348297
Singlish-interface597142
Multi-interface31640
# of proteins with alternate conformationsNon-hub6632
Singlish-interface8834
Multi-interface6024
# of edges with motionsConflicting22830
Compatible35448

We make DynaSIN v1.0 available online at a publicly accessible URL (http://dynasin.molmovdb.org) in a format that can easily be parsed. The dataset consists of two components which are interlinked. The first component corresponds to the structural interaction part and thus lists all of the interactions in SIN v2.0, along with the participating proteins, interaction type (transient/permanent), and its structural information (such as the PDB IDs of the complex, chain IDs corresponding to each protein, and the interfacial residues). The second component catalogs the dynamics of SIN v2.0 and provides an exhaustive list of alternate conformations for each protein (wherever available), node type (non-hub, or singlish-interface hub, or multi-interface hub), and set of interactions in which the corresponding protein is involved, as well as the interactions that are disrupted by those alternate conformations (i.e., conflicting motions). The two components are cross referenced by a common ID given to each protein. After going through the entire above procedure, in comparison to human and yeast, E. coli had too little coverage to obtain any statistical significant results. Thus, although the E. coli SIN is provided on the DynaSIN resource page, we did not include it in the subsequent analysis outlined below. A network view of the human SIN is provided in the Supporting Information (Supporting Information Fig. S3).

Hubs display a greater degree of conformational changes than do non-hubs

We calculated the degree of motion demonstrated by hubs and non-hubs by calculating the RMSD between alternate conformations for the proteins from each category. Hubs were defined as those proteins with more than or equal to five interacting partners in the SIN, which is the same definition adopted in the original study.3 For both human and yeast, we found that hubs show significantly greater degrees of conformational change than do non-hubs [Fig. 3(A,B), P-value calculated using the Kolmogorov–Smirnov test]. This may be related to the ability of hubs to interact with more partners: larger conformational changes may enable them to explore greater conformational space, thereby facilitating such interactions.

Figure 3.

Range of motion shown by different classes of proteins. RMSD values are obtained from the structural alignment of non-hubs and hubs with their corresponding alternate conformations for human (A) and yeast (B). RMSD values are obtained from the structural alignment of singlish- and multi-interface hubs with their corresponding alternate conformations for human (C) and yeast (D). For each subgraph, the P-values indicated are calculated from a two-sample Kolmogorov–Smirnov test with the null hypothesis that the right sample is less than the one on the left. A low P-value means that we can reject the null hypothesis, and the true distribution of the right dataset is more than that of the one on the left.

Multi-interface hubs show a greater degree of conformational changes than do singlish-interface hubs

As described, we further categorized proteins into singlish- and multi-interface hubs based on the number of binding interfaces. We calculated the degree of protein conformational change for the two classes and found that multi-interface hubs show more motion than singlish-interface hubs, with a significant P-value for both species [Fig. 3(C,D)]. It can be argued that the degree of change shown by these proteins occurs on a rather small scale (with a range of a few Angstrom), and these small motions might be a result of the low resolution at which some of these structures may have been solved. To address this concern, we removed those cases for which the protein showed less RMSD than the resolution (in Angstrom). With this filtered set, we again compared non-hubs with the hubs as well as singlish-interface hubs with multi-interface hubs. We found that the results were the same as those obtained before, suggesting that the observations above were independent of artifacts from low-resolution structures (Supporting Information Fig. S4).

The degree of conformational change is correlated with the number of interfaces

The distinguishing feature between singlish- and multi-interface hubs is the number of interacting interfaces; while singlish-interface hubs may have only one or two interfaces that are used to interchangeably bind multiple partners, multi-interface hubs can have more than two such interfaces. This might lead to the hypothesis that the degree of motion may conceivably be correlated with the number of interfaces. Indeed, we found that those proteins with more interfaces display larger conformational changes than those with fewer interfaces (Fig. 4, numbers provided in Supporting Information Table S1). In both human and yeast, there was a direct relationship between the degree of conformational change and the number of interfaces.

Figure 4.

Range of protein motion versus the number of unique interfaces for human (A) and yeast (B). For each subgraph, the P-values indicated are calculated from a two-sample Kolmogorov–Smirnov test between two consecutive distributions, with the null hypothesis that the one on the left (the one with lower number of unique interfaces) is greater than the one on the right. A low P-value means that we can reject the null hypothesis, and the true distribution of the left dataset is smaller than that of the one on the right.

Permanent interactions involve larger interfacial changes than do transient interactions

By aligning the alternate conformation(s) of a protein with its structure in the interaction complex, we can identify the interfacial changes that are associated with binding (See Supporting Information Fig. S5 for a detailed explanation). For example, if a protein can bind to its partner even in its alternate conformation, there is presumably no (or very insignificant) interfacial change involved. This may happen, for instance, in cases involving allosteric changes. However, if the alternate conformation interferes with the interaction in any way, the protein must undergo an interfacial change to accommodate binding. In some cases, different binding interfacial changes may be required for different partners. Such motions suggest that the corresponding interaction may induce interfacial changes responsible for facilitating interactions. Here, the term “motion” is used to denote interfacial conformational changes which might be whole or only a part of the overall molecular motion.

We calculated how frequently permanent and transient interactions are associated with interfacial changes. We found that although there was no difference between the size of interfaces involved in permanent and transient interactions (Supporting Information Fig. S6), for both the species, permanent interactions more frequently involve a modification of the binding interface than do transient interactions (Fig. 5). For yeast, all permanent interactions require changes in interfacial regions.

Figure 5.

Fraction of interfacial changes induced by permanent and transient interactions in human (A) and yeast (B).

These conformational changes have associated energy, and depending on the extent of structural changes, this energy may reach several kBTs. Although the binding free energy provides for these changes, the energetic requirements can be significantly higher for more dramatic structural rearrangements. This fact, combined with the observation that permanent interactions entail large protein motions, raises the question: how can the proteins or interfaces afford to undergo such large changes? To answer this question, we focus on the different mechanisms by which permanent and transient interactions proceed.

Permanent interactions involve one partner, and to facilitate the association, the interface needs to undergo conformational change only once, if at all. In the case of multiple permanent interactions, the corresponding interfaces can change their conformations to sequentially bind their partners to achieve the final complex [Fig. 6(A)]. Although the order in which they bind partners is random in some cases, this ordering is usually more specific in that the binding of one partner facilitates the binding of the next. This may easily be achieved by the induction of allosteric changes.13 An example of such allosteric effects is the binding of cyclin to cyclin-dependent kinase 2 (CDK2), which acts as a checkpoint in the eukaryotic cell cycle. Cyclin binding displaces CDK2's activation segment and makes its substrate-binding site accessible for ATP19 and subsequently for the CDK-activating kinase.20

Figure 6.

Different ways in which permanent and transient interactions proceed. (A) Permanent interactions involving multiple partners can proceed in many ways (two are shown here). For example, to attain a final complex with two permanent partners, the protein can undergo conformational changes in a random (or often specific) order to sequentially bind partners. (B) Transient interactions interchange between partners, as the same interface is used in multiple interactions. (C) A proposed schematic for how transient associations (using the same interface) involve domains/interfaces that are structurally very similar to each other. Such associations entail smaller conformational changes and lower energy requirements, corresponding to the conformational changes between similar partners.

Transient interactions, on the other hand, entail multiple partners binding to the same interface, which is accompanied by partner-specific changes in the interfacial region [Fig. 6(B)]. Since the protein must associate and dissociate interchangeably, the associated energy can be quite high if the interfacial structural changes between these associations are substantial. So, how can these proteins favorably minimize conformational changes? We propose that one means of avoiding large structural changes is through interaction with interfaces that are not structurally very different from one another (they are located on similar sites on the same class of domain). If these partner interfaces are structurally similar, it is more likely that the protein will not undergo very large structural changes while switching between these partners and will not require a high energy for conformational changes, thereby better enabling the transient nature of such interactions [Fig. 6(C)].

Transient interfaces interact with fewer classes of domains

To investigate the rationalization provided above, we counted the number of different classes of domains with which interfaces interact in a transient fashion and compared these values to those from a control set of structural interfaces. The type of domain involved in each interaction (in Pfam notation) was obtained directly from the PDB (also provided on the DynaSIN resource page). The control set of structural interfaces was obtained from the 3DID database, which is a dataset of structural instances of domain–domain interactions.21 This entire database was parsed to obtain all human domain–domain interactions (a total of 8659 interactions between 1291 unique domains). The number of partners of each of these domains was then enumerated, and we computed the fraction of domains that had two or more distinct partners. We found that interfaces involved in transient interactions associate with fewer classes of Pfam domains relative to the control set of interfaces (Fig. 7). This observation provides evidence for our hypothesis that proteins involved in transient interactions minimize conformational changes (and hence associated energetic requirements) by interacting with similar classes of partner domains, thereby precluding any requirement of dramatic interfacial changes between partners.

Figure 7.

Fraction of transient and control interfaces that interact with multiple (two or more) classes of Pfam domains.

Discussion and Conclusion

We have integrated protein motion dynamics with protein–protein interactions to create DynaSIN v1.0. We provide the dataset to the public in a user-friendly format that can be easily parsed. Our downstream analysis of the dataset reveals significant differences between different classes of proteins and interactions. We have shown that hub proteins show a higher degree of conformational changes than do non-hubs. Within the hubs category, the multi-interface hubs display a greater amount of conformational changes, on an average, than do singlish-interface hubs, and the extent of conformational change is related to the number of unique interfaces on the protein surface; proteins with more interaction interfaces typically undergo greater degree of conformational changes during interactions.

We have also demonstrated that permanent interactions are associated with greater degree of conformational changes than are transient interactions. Moreover, the interface involved in a transient association typically interacts with fewer classes of domains. Together, these observations reveal how permanent and transient interactions proceed. Permanent interactions entail a single partner binding to an interface, which is not shared by any other partner. Since these associations are permanent in nature, they require only a one-time modification of the interface (if it is indeed modified during the interaction), and so a large associated energy would be required only once. Figure 8(A) gives an example of two proteins that are involved in two permanent interactions. Both involve large conformational changes of the interface or the entire structure. Transient associations, on the other hand, entail multiple partners binding to the same interface interchangeably. Since these are transient in nature, the protein needs to switch between these partners, and the associated energy would be high for dramatic interfacial changes. The proteins reduce these costs by interacting with domains that have structurally similar interfaces, which do not require large conformational changes between different association states. An example of such transient interactions is provided in Figure 8(B), in which β-2 microglobulin interacts with eight different partners using the same interface. All interactions involve the same partnering domain and small conformational changes (less than 1 Å).

Figure 8.

Examples of conformational changes associated with permanent (A) and transient interactions (B). Values next to the arrows indicate the conformational changes (RMSD, in Å) of the interface, whereas those in parentheses are the RMSD values for the entire structure.

Previous work has demonstrated that the average number of disordered residues is higher for hubs than for non-hubs.22–26 Although our observation is in agreement with these studies (with the common implication that hubs display higher conformational flexibility), there is a fundamental difference between those studies and the work outlined here: they investigate the intrinsically disordered regions of proteins (which, in many cases, are predicted from protein sequences), which do not completely fold and remain disordered, whereas we study the structural changes in the binding interfacial regions of the crystallized ordered proteins with fixed folds. This difference might also explain some apparent disagreement with previous results demonstrating that the disorder of a protein is independent of its number of partners,27 and that singlish-interface hubs have a higher fraction of disordered residues.26 These studies examined the disordered nature of proteins, which is quite different from conformational changes due to protein binding. In another study, Higurashi et al. proposed that disparities between transient and permanent hubs lie more in intrinsic overall flexibility than in local enrichment of disordered residues.28 However, they used a different approach to identify the transient hubs than the approach adopted as part of this study: they included proteins that exist in the PDB in more than three “binding states” (different binding partners) while also including the binding states of the homologous proteins in the closely related sequence family. Thus, unlike in this work where we focused on the usage of the same binding interface by different partners, in the study by Higurashi et al., the interfaces used for these associations were not taken into account, resulting in a different definition of the transient hubs.28

It should be noted that the current analysis was based on only a small fraction of the proteome (only a few hundred proteins in human and even fewer in yeast). This is due to various constraints: in addition to a protein structure being available in isolation, its structure in complex with its partner is also required for mapping its alternate conformations. So, the observations presented here should be treated with caution. However, with the ever-increasing number of protein structures being solved, we envision expanding this analysis to a larger set. With this development, it is only reasonable to include this growing repertoire as a means of gaining greater intuition into the process of protein binding in the context of large-scale protein–protein interaction networks. The combination of structural information, along with expression data, can transform a static node-and-edge picture into a dynamic process with the added dimension of time11 and reveal details about how these different classes of interactions proceed. Although investigations remain somewhat limited by the scarcity of data of different types, such integrative approaches can prove very useful for obtaining a better understanding of cellular regulation.

Materials and Methods

Integration of networks and protein motion

The integration of networks with protein motions can be a complex task, as these data are derived from different levels of biological organization. Networks exist at the gene product level (each entity of the network represents a gene product), whereas most of the structures in the PDB correspond to only a part of this gene product set (a domain, for instance). Moreover, the mapping between these two forms of data is not one-to-one; multiple structures may be available for different parts (as when different domains are solved as part of different experiments) or for the same parts (multiple conformations of the same part of the protein). Consequently, we devised the following protocol to map structures and motions onto networks. For each protein in the SIN, we extracted all of its occurrences from the PDB using their UniProt IDs (which was used build SIN v1.0). As mentioned, we chose UniProt ID annotation as it uniquely maps each chain in the PDB to a single protein. In most cases, only single occurrences were found, indicating that the corresponding protein currently has only one conformation in the PDB. In the remaining cases, the presence of proteins in multiple PDB structures suggests the adoption of alternate conformations (Table I). Lastly, for a given structure, we aligned its alternate conformation with the conformation in the complex to identify the extent of conformational changes at the interfaces to compile DynaSIN v1.0.

Identification of protein conformational changes

Characterizing protein motions can be nontrivial and depends upon the way motions are defined and the structural alignment method that is used to identify these motions. Different alignment methods can give different results and identify different moving parts. Heuristic alignment techniques that aim to minimize the RMSD between two structures may not capture the changes in certain regions of the structure (Fig. 2). Here, we identify the motions in an interface-centric manner. We aligned the alternate conformations of a protein with its structure in the interaction complex using rigid blocks by first superimposing the interfacial regions (Fig. 2). Rigid blocks are defined as those parts or blocks for which the changes in distances between any pair of amino acids (between conformations) differ by some value which falls below a designated sensitivity cutoff. We used the previously published method, RigidFinder,29 to identify rigid blocks that either contained the binding interface or had the largest overlap with the interface (in terms of the number of residues) in those cases for which there were multiple rigid blocks overlapping with the interface. The interface was defined in the following way. We determined the set of heavy atoms from one structure that were within 2 Å of any heavy atom from the other structure, and vice versa. The parent residues of these heavy atoms constituted the interface in the two proteins. The two structures were then aligned by superimposing this rigid block while minimizing the RMSD of this superimposition. This approach ensured that real conformational changes in the interfacial changes were detected, and the relative motion between the interface and other parts of the protein did not contribute to the identification of interfacial changes. This alignment allowed us to classify each motion as “conflicting” (if the interfacial change is disruptive to the interaction) or “compatible” (if it does not disrupt the interaction).

Ancillary

Advertisement