LEVELNET to visualize, explore, and compare protein–protein interaction networks

Physical interactions between proteins are central to all biological processes. Yet, the current knowledge of who interacts with whom in the cell and in what manner relies on partial, noisy, and highly heterogeneous data. Thus, there is a need for methods comprehensively describing and organizing such data. LEVELNET is a versatile and interactive tool for visualizing, exploring, and comparing protein–protein interaction (PPI) networks inferred from different types of evidence. LEVELNET helps to break down the complexity of PPI networks by representing them as multi‐layered graphs and by facilitating the direct comparison of their subnetworks toward biological interpretation. It focuses primarily on the protein chains whose 3D structures are available in the Protein Data Bank. We showcase some potential applications, such as investigating the structural evidence supporting PPIs associated to specific biological processes, assessing the co‐localization of interaction partners, comparing the PPI networks obtained through computational experiments versus homology transfer, and creating PPI benchmarks with desired properties.

Here, we report on LEVELNET, a versatile computational framework designed to integrate and explore PPI networks coming from multiple sources of evidence. Starting from a set of protein chains whose 3D structures are available in the PDB, LEVELNET builds a grid of networks for each source ( Figure 1A,B) representing different "views" on the interactions. It allows for clustering groups of similar proteins (nodes in the network) by exploiting global sequence identities between proteins and inferring interactions (edges in the network) through homology transfer or confidence scores ( Figure 1B).
Networks coming from different sources can be integrated in an aggregated graph (Figure 1C,D,G) in which an interaction between two chains is represented as a multi-edge between two nodes, where the multiplicity comes from the different sources of evidence. Also, each edge is assigned a weight reflecting either a property from the source or the reliability of the evidence. This resulting information-rich framework helps to reason about interactions and to extract various biological information.
LEVELNET helps to compare interactions and non-interactions coming from different sources of evidence, thereby facilitating the identification of potential inconsistencies between these sources.
More specifically, it exploits experimentally resolved physiologically relevant interfaces from the PDB, annotated interactions from HIPPIE and non-interactions from Negatome [43], and optionally user-defined interactions and/or non-interactions ( Figure 1C). It extends the set of direct physical interactions observed in the PDB among the input proteins by transferring knowledge from complex structures involving the same or similar proteins. For closely related homologs, we expect functional interactions to be conserved [28].
Moreover, works by us and others showed the biological pertinence and usefulness of accounting for homology-transferred interactions when evaluating protein-protein/DNA/RNA interface prediction methods [12,55].
LEVELNET can be used to gain some visual interactive insight into a broad range of PPI-related questions, such as to what extent a signalling pathway or biological process has been structurally covered experimentally. Or where the interactions at play in a pathway take place in the cell and which proteins establish connections between the different cellular compartments. Or which of the PPIs predicted by an ab initio approach are supported by structuraland homology-based evidence. We illustrate these applications on the photosynthesis process, the Wnt signalling pathway, and a couple of established protein docking benchmarks. We show a few more usage cases, such as identifying cross-interactions among a

Significance Statement
Almost all biological processes depend on physical interactions between proteins. Yet, the current partial, noisy, and highly heterogeneous data associated with these interactions lead to incomplete and fragmented PPI networks.
Computational approaches for investigating PPIs are fundamental to obtaining accurate assessments of the protein interaction map in the cell. LEVELNET is a versatile and interactive tool that helps experimental and computational biologists to achieve this goal. It provides a user-friendly web interface designed to visualize, explore and compare PPI networks deduced from different data sources. Using a user-controlled graph grid allows for analysing the transfer of interactions from one species to another and visualizing topology changes in the network. The possibility offered by LEVELNET to directly compare alternative networks and topologies makes the complexity of PPIs intuitive, thereby facilitating biological interpretation. LEVELNET provides the users with multiple functionalities (i) to investigate how a group of proteins establish interactions, (ii) to discover the community of interactors for a given protein, and (iii) to find the interacting patches on a protein surface. It also provides a way to integrate and compare multiple PPI networks, assess predicted PPIs, and compile benchmark sets for training and testing predictive methods on PPI-related tasks.

Overview of LEVELNET
LEVELNET offers three types of analysis. The first two types operate in an interactive mode, allowing the user to interactively explore and compare networks of interactions defined either based on a certain neighbouring depth around a single input protein chain ( Figure S1) or among a set of input protein chains. The users can designate the input chain(s) with their PDB code(s) or their Uniprot identifier(s). When several chains are given as input, the users may optionally specify some relationships between them with a real-valued pairwise matrix ( Figure 1A). In the third type of analysis, LEVELNET discovers the interacting patches on the surface of the input protein and returns the results for offline investigation.

F I G U R E 1 Protein-protein interaction (PPI) network representation and analysis in LEVELNET. (A)
The input is a list of proteins (or protein chains) optionally accompanied by some pre-defined relationships (user-defined annotations). (B) A grid of networks computed by LEVELNET from the PDB as source. The user has access to the layers by modulating the sequence identity on the "node reduction" and "edge inference" options and can visualize different networks of the grid. Similar grids are available for HIPPIE, Negatome, and user-defined sources (see Methods). are clusters at some level of sequence identity, containing chains P and P' , Q and Q' , T, T' and T ′′ , respectively. Chains P from cluster  and Q from cluster  are in physical contact (blue edge). This interaction leads to inferring some interactions with and among their homologs (pink edges). When two chains from the same cluster are in direct contact, here T and T ′ from cluster , self-interactions are also induced by homology. comprises 9 × 9 = 81 PPI networks corresponding to nine node resolutions by nine edge inference levels ( Figure 1B). One network, referred to as "observed", comprises as many nodes as input chains ( Figure 1B • each (super-)node corresponds to a set of input protein chains sharing more than X% of their amino acids; • each edge between two (super-)nodes, representing the sets S 1 and S 2 of input chains, indicates that there exists at least one observed interaction in a PDB biological assembly between some chain from S 1 or some homolog at >Y% sequence identity, and some chain from S 2 or some homolog at >Y% sequence identity.
Upon relaxing the node sequence identity threshold, the nodes representing similar chains will be progressively merged into super-nodes and thus the network will simplify. Upon relaxing the edge sequence identity or confidence threshold, new edges will appear and thus the network will become denser.
The source of evidence for an interaction and the type of interaction are coded by the colours and directionality of the edges. More precisely: • a double-arrowed blue edge (type 1) P ⟷ Q indicates that P and Q are in contact in a known biological assembly from the PDB, • a directed pink edge (type 2) P ⟶ Q indicates that a chain homologous to P is in contact with Q in a biological assembly from the PDB, • an undirected pink edge (type 3) P --Q indicates that this interaction is not supported by a contact observed for P or Q, but only by contacts between homologs of P and Q.
• an undirected purple edge (type 4) P --Q indicates that this interaction is supported by HIPPIE annotations, • a blocked black edge (type 5) P ■-■ Q indicates that P and Q are not supposed to interact with one another based in the Negatome; • an undirected green edge (type 6) P --Q stands for a user-defined interaction between P and Q.
We emphasize that the directionality of the edges neither carries information regarding causality of the interactions nor corresponds to the potential hierarchy of signal flow between the interacting chains.
For the PDB, biological assemblies made of several copies of the same chain (generated by utilizing crystallographic symmetry operators) will lead to a blue (type 1) self-connection. For HIPPIE and the user-defined matrix, each pair of input proteins is annotated with a real-valued confidence score (between 0 and 1) for their interaction. Hence, the grids derived from them are defined continuously. Note that HIPPIE comprises only interactions between human proteins, and that the scope of the Negatome is limited.
The users can navigate from one layer to another within the grid associated with each source and also across the grids. Within a grid, they can modulate both the number of nodes and the number of edges.

2.1.2
Comparing several layers LEVELNET allows for comparing several layers coming from different sources ( Figure 1C). The users can overlay several layers either for the entire set of input chains (Overlay switch button) or focusing on a selected subset of input proteins. The selection can be: • node-centred, upon clicking on a chosen node. LEVELNET then highlights its outgoing and incoming edges in thick dash lines and colours its homologs in green and the chains belonging to the same complex in yellow ( Figure 1D). This functionality helps, for instance, to detect homo-oligomers.
• at the level of a connected component, to focus on a signalling pathway for instance ( Figure 1E).
Once the users have selected a subset of nodes, they can create a multi-edge graph by superimposing several layers and directly compare the corresponding interactions ( Figure 1G). This analysis also allows the users to discover inconsistencies among various resources of PPI.

2.1.3
Offline analysis of all interacting surfaces exhibited by a set of protein chains Beyond allowing for the identification, visualization and comparison of PPIs, LEVELNET provides the users with a residue-level description of protein interaction surfaces. In this operational mode, the web server outputs the ensemble of interacting patches, each query chain and its homologs (at a certain level of sequence identity) together with all chains in the PDB that are physical partners in some complex.
The interacting patches corresponding to the physical interactions are mapped onto the input query chain and merged to provide a label, either interacting or non-interacting, to each surface residue of the query protein.

Source databases
PDB (July 2022 release) entries were downloaded from the FTP archive https://rsync.wwpdb.org/ftp/data/biounit. Entries with more than 100 chains or with a resolution lower than 5 Å were discarded.
Protein chains smaller than 20 residues or with more than 20% of unknown residues were also discarded.

Pre-computed databases derived from the PDB
To infer interactions from the PDB, LEVELNET relies on two precomputed databases. We describe here the computational procedure we implemented to build these databases.

Database of interfaces
We processed all physiologically relevant complex structures from the PDB, namely biological assemblies from X-ray crystallography and cryogenic electron microscopy, and NMR models, using the interface detection algorithm INTBuilder [11]. Two residues were considered as in contact if the distance between any of their atoms was smaller than 5Å. We call the resulting database of interfaces interfaceDB.

Database of PPI networks
Here, we describe how we pre-computed the PDB-based grid of PPI networks. We assume  and  are two clusters of protein chains sharing a certain percentage of sequence identity, containing chains P, P ′ and Q, Q ′ respectively ( Figure 1H). We considered that chains P from cluster  and Q from cluster  were in physical contact (observed interaction, type 1, blue edge) if we could find more than M = 5 pairs of residues (r, s), where r belongs to P and s belongs to Q, located less than D = 5Å away from one another. More formally, where d(r, s) is the minimum distance between any atom of residue r to any atom belonging to residue s. D is the distance threshold set to 5 Å by default, M is the minimum number of interface residues set to 5 for both proteins by default. The interaction observed between P and Q leads to inferring some interactions with and among their homologs.
Indeed, for all P ′ ∈  and Q ′ ∈ .P ≠ P ′ and Q ≠ Q ′ , we can infer that • some homolog of Q ′ (here, namely, Q) interacts directly with P (homology-transferred interaction, type 2, directed pink edge from Q ′ to P), • some homolog of P ′ (here, namely, P) interacts directly with Q (homology-transferred interaction, type 2, directed pink edge from • some homolog of Q ′ interacts directly with some homolog of P ′ (homology-transferred interaction, type 3, undirected pink edge).
Note that this formalism is also suitable for inferring interactions within a given cluster (i.e., when  = , see Figure 1H, cluster  in bottom panel).
In practice, our algorithm iteratively determines the existence of chain-chain, chain-cluster and cluster-cluster interactions ( Figure 1H).

end
On lines 2 and 3, the function find_cluster determines to which clusters chains P and Q belong at a certain percentage of sequence identity. Lines 4 and 5 set the cluster identifiers as the chains' attributes. On lines 6 and 7, the function add_neighbour sets a cluster-cluster interaction. This interaction implies that all members of  will be linked to all members of  by interactions that are at least of type 3. On lines 8 and 9, the function add_target sets two chain-cluster interactions.
This operation implies that P (resp. Q) will be linked to all chains from  (resp. ) by interactions of at least type 2. We call the resulting database of chain-chain, chain-cluster and cluster-cluster interactions PDBinteractionDB.

Algorithm at inference time
Given an input set of protein chains, LEVELNET will interrogate PDBin-teractionDB to create type 1, 2, and 3 edges. inferred from a single source and with a weight higher than a certain threshold ( Figure 1B).

Implementation details
To generate the PDB interaction database, we developed a pipeline in

Benchmark databases used in the applications
We used three benchmark databases to showcase the applications of LEVELNET. These databases are the Docking Benchmark ZDock version 5.5 (ZDockv5) [54] and version 2.0 (ZDockv2) [29] and Dock-Ground (DG4) [24]. We used all the single-chain proteins of these databases. For ZDockv2, we chose a subset of 88 single-chain proteins (out of 168 proteins) for our evaluation purposes (see Results) and called it ZDockv2_s88.

Exploring and discovering the interactions underlying photosynthesis
As a first case study, we considered photosynthesis in the green Setting the sequence identity threshold at 70% for node reduction and 95% for edge inference from the PDB leads to a network comprised of 96 pairwise interactions between 53 nodes ( Figure 2D).
Relaxing the threshold to 30% for edge inference densifies the PLHC-I subnetwok, and reveals a few new self-interactions and also crossinteraction between PLHC-I and PLHC-II ( Figure 2E). In particular, three interactions are inferred between the LHCII antenna (comprised of LHCII-1.3, LHCII-3 and LHCII-4 and represented by the node Q93WE0) from the PLHC-II subnetwork and three proteins from the PLHC-I subnetwork, namely the P700 chlorophyll a apoproteins PSAA and PSAB and the Chlorophyll a-b binding protein LHCA2. This inference is consistent with the fact that LHCII acts as an antenna for both photosystems I and II [9,19] and in perfect agreement with a series of recent PDB structures of the supercomplex PSI-LHCI-LHCII from Chlamydomonas reinhardtii, released after June 2020 and thus not included in the present analysis [36].

Localizing PPIs in the Wnt signalling pathway
Next, we focused on the highly evolutionarily conserved canonical Wnt signalling pathway that regulates gene transcription by passing signals from the cell surface receptors to the nucleus [1]. We used LEV-ELNET to visualize the structurally determined PPIs involved in this pathway in the context of their subcellular localisation. As input, we considered the list of 85 PDB entries reported in ref. [1]. We defined a custom adjacency matrix where we put a value of 1 for each pair of protein chains sharing the same cellular location, and gave it to

Comparing predicted versus homology-transferred PPIs
We used LEVELNET to compare the PPIs predicted by a complete cross-docking experiment [27] with the structurally characterized PPIs available in the PDB (Figure 4 and S2). We considered an ensemble of about 4000 putative protein pairs coming from the ZDockv2_s88

F I G U R E 2 Structurally characterized PPI network for photosynthesis in Chlamydomonas reinhardtii. (A)
Overview of the network computed using the PDB as the source database, with the three largest connected components explicitly delineated. Each node represents a protein chain. Blue and pink edges represent observed and homology-transferred interactions, respectively. We considered homologs at 95% sequence identity. Arrows indicate direct contacts. For instance, a pink edge directed from P to Q indicates that a homolog of P is in contact with Q in a biological assembly from the PDB. If the reciprocal is true, that is, a homolog of Q is also in contact with P in a biological assembly from the PDB, then the edge is bidirectional (double arrow). The connected components correspond to the RuBiSCo complex, and the photosystem light harvesting complexes of types I and II. This network also includes other protein chains forming small subnetworks. (B) Node reduction on RuBiSCo: merging the nodes with more than 95% sequence identity results in two super-nodes corresponding to the small and large subunits of the RuBiSCo complex 3D structure, in red (chains IKMO) and blue (chains ACEG), respectively. (C) Interactions observed in the PDB (blue) for the Cytochrome b6f enzyme. Nodes are highlighted according to the chain colours in the Cytochrome 3D structure. D-E. Interaction networks after merging nodes sharing more than 70% sequence identity: edge inference by 95% (D) and 30% (E) sequence identity. The edges connecting two disjoint connected-components are shown as dashed lines.

Decrypting and customising benchmarks
LEVELNET can be used to complement the information provided in a benchmark set of protein-protein complexes. Starting from the ZDockv5 benchmark [54], it inferred up to 200 interactions per protein by applying homology transfer based on >70% sequence identity ( Figure S3 and S4AC). The relevance of the inferred interactions depends on the protein functional class. The antigen-binding fragments (FAB) display the highest node degrees ( Figure S3A), but most of the residues defining the paratope are not conserved across the homologs ( Figure S3A, red sticks). In such cases, transferring interactions by homology may not be valid, even at 95% sequence identity. Leaving  Figure S4BD).
This type of analyses helps to get a broader view of a set of proteins than the annotations at hand permit, and also emphasizes the complexity underlying the behaviour of a protein within a community. The PPI network of this benchmark is shown in Figure S5.

Interacting patches on the glucocorticoid ligand-binding domain
We applied the offline analysis of all interacting surfaces exhibited by a set of protein chains to the human glucocorticoid receptor ligandbinding domain ( Figure 5). Starting from the input query chain 1NHZ:A, LEVELNET retrieved 61 interacting patches at > 90% sequence identity. One patch was observed for the input chain (from the 1NZH PDB entry). It comprises 33 residues and corresponds to a cyclic C2 homodimer ( Figure 5A). 60 patches were inferred by homology transfer, and 57 of them correspond to chains that are in fact identical to the query (same Uniprot Id P04150). These patches, totalling 130 residues, unveil a wide diversity of alternative self-assemblies ( Figure 5B). This result is in line with our previous work emphasizing the importance of accounting for the multiple usage of a protein's surface residues by several partners and for the variability of protein interfaces coming from molecular flexibility when assessing protein interface predictors [12].
The three remaining patches involve the homologous glucocorticoid ligand-binding domain from mouse. They do not add new information since all their residues are already comprised in the 58 human protein-associated patches.

F I G U R E 5
Interacting surfaces of the glucocorticoid receptor ligand-binding domain. The query protein chain (PDB id: 1NHZ:A) is displayed as a grey cartoon. (A) Homodimer corresponding to the entry 1NHZ (second biological assembly). The interacting surface is highlighted in surface and the second copy of the protein is in marine cartoons. (B) Different binding modes for the protein self-assembly. The highlighted surface covers the residues engaged in at least one interaction involving the query chain or a homolog at >90% sequence identity in the PDB. The blue, magenta and green cartoons represent interacting copies of the protein in the PDB complexes 1NHZ, 3E7C, and 4LSJ, respectively.

CONCLUSION
LEVELNET is a valuable asset for the community to explore protein interactions. It is useful for the biologists interested in the physical contacts of a particular protein or a set of proteins as well as for those who develop and assess computational predictive approaches for interface, partner and complex predictions. It provides a convenient mean to account for different types of relationships between proteins, for example, functional annotations, cell co-localization, spatiotemporal proteomics, or co-occurrences in publications, and investigate how the latter are correlated to physical interactions. We performed all the analyses reported here based on the June 2020 release of the PDB. Nevertheless, we have been updating LEVELNET based on more recent releases, which include 3D models predicted by AlphaFold [20], and will continue to maintain the pre-computed databases up-to-date.
Future developments will concern the integration of other databases, be they computational or experimental. In particular, we will upgrade LEVELNET with information coming from reliable predictions of protein complexes using AlphaFold, RosettaFold [3] or methods inspired by the latter. We expect the body of these predictions to massively increase in the coming years. Another direction will be to allow for merging and selecting nodes based on annotations from Uniprot [51], SCOP [8,13] or CATH [42] -LEVELNET already displays node-specific information of this type. Future versions of LEVELNET will investigate different ways of defining protein entities, for instance based on Uniprot segments and annotations coming from PDBe-KB [37].
Directions of improvement include a more refined homology-transfer procedure accounting for the differences in the plausibility of the inferred interactions depending on the protein functional classes. More generally, identifying physiologically relevant interfaces and discriminating them from crystal contacts or aggregation-prone interfaces is a challenging task. Recent efforts have been made to provide the community with reliable annotations [40]. LEVELNET could also be improved by implementing the possibility to deal with multi-chain proteins and protein-nucleic acid interactions. Finally, we plan to increase the richness of the information provided by LEVELNET, for example, by describing at the residue-level the binding sites and the bindingassociated conformational changes, and giving access to properties such as binding affinities and the effect of mutations over the network.