Constraint‐based models for dominating protein interaction networks

Abstract The minimum dominating set (MDSet) comprises the smallest number of graph nodes, where other graph nodes are connected with at least one MDSet node. The MDSet has been successfully applied to extract proteins that control protein–protein interaction (PPI) networks and to reveal the correlation between structural analysis and biological functions. Although the PPI network contains many MDSets, the identification of multiple MDSets is an NP‐complete problem, and it is difficult to determine the best MDSets, enriched with biological functions. Therefore, the MDSet model needs to be further expanded and validated to find constrained solutions that differ from those generated by the traditional models. Moreover, by identifying the critical set of the network, the set of nodes common to all MDSets can be time‐consuming. Herein, the authors adopted the minimisation of metabolic adjustment (MOMA) algorithm to develop a new framework, called maximisation of interaction adjustment (MOIA). In MOIA, they provide three models; the first one generates two MDSets with a minimum number of shared proteins, the second model generates constrained multiple MDSets (k‐MDSets), and the third model generates user‐defined MDSets, containing the maximum number of essential genes and/or other important genes of the PPI network. In practice, these models significantly reduce the cost of finding the critical set and classifying the graph nodes. Herein, the authors termed the critical set as the k‐critical set, where k is the number of MDSets generated by the proposed model. Then, they defined a new set of proteins called the (k−1)‐critical set, where each node belongs to (k−1) MDSets. This set has been shown to be as important as the k‐critical set and contains many essential genes, transcription factors, and protein kinases as the k‐critical set. The (k−1)‐critical set can be used to extend the search for drug target proteins. Based on the performance of the MOIA models, the authors believe the proposed methods contribute to answering key questions about the MDSets of PPI networks, and their results and analysis can be extended to other network types.

Wakai et al. [16] applied an MDSet model and found an MDSet enriched with essential, cancer-related, disease genes, and identified drug-target proteins. Their model can identify only one MDSet, although the PPI networks contained many MDSets [8,13,17]. The critical set that contains common nodes in all MDSets of the PPI network has important locations in the PPI network and can be enriched with biological functions [12,20]. Interactions with the critical nodes have eminent effects on the targeted network topology [12]. Therefore, discovering and testing the featured MDSets, and efficiently identifying the critical nodes are important for the analysis of PPI networks, as well as for ensuring model robustness [11].
Determining the MDSet is an NP-complete problem [10], but no algorithm can find the MDSet in polynomial time [21]. Nacher and Akutsu [22] suggested an integer linear programming representation (ILP-based) model to determine an optimal solution for the MDSet problem. Wuchty [8] applied the ILP-based model to human and yeast PPI networks. Zhang et al. [13] developed the centrality-corrected MDSet model that considers the degree and the betweenness centralities of proteins. Their model subsequently found more functionally significant proteins in essential genes, disease-associated genes, ageing genes, and virus-targeted genes. Despite their results, they concluded that relying on topological properties is not enough to predict the important proteins for consideration [17]. In this work, the authors hypothesised that the significance of enrichment analysis is affected by the algorithm used to determine the MDSet [23,24], as deciding on the best MDSet for dominating the whole network is difficult [13]. Grinstead and Slater [25] reported that finding two or more MDSets with minimum intersection is an NP-hard problem. Moreover, the set of shared nodes among all MDSets of the PPI network is called the critical set [20]. Wuchty et al. [12] found that in PPI networks, the critical set of proteins plays an important role in phosphorylation and regulatory events in their interactions.
Herein, a new framework is introduced, called maximisation of interaction adjustment (MOIA), to generate multiple MDSets for a given PPI network. The proposed MOIA is adopted from the minimisation of metabolic adjustment (MOMA) and linear MOMA algorithms used in metabolic networks [27,28]. In MOIA, the authors developed a new model that generates two MDSets with the maximum differences between their nodes. The shared nodes between these two MDSets can be seen as the essential nodes that tightly contain the critical set of this network. Therefore, by calling on the optimisation algorithm only once, the proposed model encloses the critical set by defining the intersection between the generated MDSets. Then, the developed model was further extended to generate k-MDSets with large differences between all of them, where k is the number of MDSets. Using these k-MDSets, all nodes in the PPI network can be classified and the critical set precisely defined, named here as the k-critical set. In addition, a new set of proteins appearing in ðk − 1Þ-MDSets was extracted and this set was identified as the ðk − 1Þ-critical set. Experimentally, it was found that the ðk − 1Þ-critical set is equally as important as the k-critical set and can be used to extend the search process for drug target proteins. Finally, an additional model was introduced to identify a specified MDSet when the user selects certain nodes as the dominating nodes. The authors believe that the MOIA method could be used to analyse biological and other networks to find the multiple and user-defined constrained minimum dominating set. This approach can also contribute to ranking the nodes in the considered data network.

| Basic model of the MDSet problem
The PPI network shown in Figure 1, drawn with the Cytoscape tool [26], could be described as an undirected graph GðV ; EÞ where proteins are represented as the nodes V of the graph and the interactions between these proteins are represented as the edges E of the consideration graph. The adjacency matrix Aðn � nÞ can be used to represent this graph, where n is the number of proteins in the PPI network, A ij ¼ 1 if the protein i interacts with the protein j or i ¼ j; and A ij ¼ 0; otherwise. A set D ⊂ V of proteins is considered as a dominating set if every node in V is either an element of D or adjacent to an element of D. The minimum dominating set of V is the smallest dominating set for the given network [8,13].
Nacher and Akutsu [20] classified the nodes of the considered graph, based on being in the generated MDSets, into three types: critical nodes (belong to every MDSet), intermittent nodes (may belong to one MDSet), and redundant nodes (never belong to any MDSet). For example, Figure 2 shows a toy graph where each node has its category. This graph contains more than 10 MDSets, two of which are shown in Figures 2(b, c). In Figure 2(d), the red node-set f3; 6g represents the critical set of the graph, the green node-set f1; 2; 5; 7; 8; 9; 10; 13g is the intermittent set of the graph, and the remaining blue node-set f4; 11; 12; 14; 15g F I G U R E 1 The yeast protein-protein interaction (PPI) network, where the set of red nodes represents an minimum dominating set (MDSet). Cytoscape tool [26] was used to draw this figure ALOFAIRI ET AL.
forms the redundant set of the graph. Mathematically, the MDSet problem of PPI networks can be formulated as a binary integer-programming problem as: The last constraint, x j ∈ f0; 1g can be replaced by the relaxation constraint 0 ≤ x i ≤ 1; i ¼ 1; 2; …; n. The resulting integer-programming problem can be solved using a branchand-bound algorithm [29] or the simplex algorithm [30]. The ILP solvers can be used to solve the model in Equation (1). Herein, the authors use the MOSEK library (MOSEK ApS, Copenhagen, Denmark) under the MATLAB programming environment (Mathwork Inc.) as the main solver for the ILP problems [31]. MOSEK solver uses the interior point method along with the branch-and-bound algorithm [32][33][34] as a default algorithm for the resulting integer optimisation problem. Several MOSEK subroutines are used to solve ILP problems in the form: where A represents the adjacency matrix of the PPI network with n nodes. Moreover, x represents the solution vector, and C; L c ; U c ; L x and U x can be defined based on the proposed model. Therefore, to solve the ILP problem described in model (1) a solver subroutine is created that receives the adjacency matrix A n�n and the remaining vectors; C; L c ; U c ; L x and U x . Then, the MOSEK subroutine "MOSEKOPT" [33] is used to solve the ILP problem and return the MDSet M as the following: The output of Solver is a binary vector (0-1 elements) of length n; where the set of elements of values 1 forms the resulting MDSet.

| Multiple MDSets of PPI networks
Several MDSets can be found for a given PPI network, and each ILP solver can return a different solution according to its algorithm [11,35]. Despite the presence of many MDSets in the PPI network, finding them all and defining their constraints is very difficult [13]. Consequently, finding two or more of these MDSets with a minimum intersection is an NP-hard problem [25]. In addition, extracting important and critical proteins from PPI networks and classifying their nodes is another challenging and time-consuming problem [12,13,15,35]. Herein, the authors developed new MDSet models that can be used to: 1. Reduce the computational cost used in finding the critical, intermittent, and redundant sets, whereas the traditional methods find these sets after calling on the solvers n times, where n is the size of the PPI network. 2. Find new sets of proteins that have different criticalness degrees. 3. Allow the user to find a special MDSet that contains the maximum number of user-defined proteins. 4. Validate the concept of the MDSet being enriched with the essential genes and biological functional categories.

| PROPOSED METHODS
The MOMA and linear MOMA algorithms [27,28] were adopted in metabolic networks to extend the ILP model in Equation (1)  constraint-based reconstruction and analysis (COBRA) of metabolic models [36], the authors called their developed models constrained-based models for dominating PPI networks (https://github.com/Alofairi1976/MOIA). Mainly, the aim was to generate MDSets with the largest number of differences among them. These different MDSets can be used to identify critical nodes, which reflect the effective proteins and gain important information about the PPI network. For example, the network in Figure 2 has 10 MDSets, however, only two of these MDSets shown in Figures 2(b, c) are sufficient to find the critical set as shown with the red in Figure 2(d).
The proposed method comprises three main stages, as shown in Figure 3. The first stage refines the given data set using suitable data preprocessing techniques, which involve PPI data collection, protein selection, and graph implementation (the adjacency matrix). The second stage involves employing one model picked from three developed models: The two most different MDSets (2MD-MDSets) model are the iterative MDSets (ITR-MDSets) model, and the user-defined MDSet (URD-MDSet) model. The 2MD-MDSets model aims to generate two MDSets simultaneously with the maximum number of different nodes between these MDSets. The ITR-MDSets model can be used to generate many different MDSets. The URD-MDSet model can generate an MDSet containing specific nodes which are determined by the user. In the third stage of the proposed MOIA method, the obtained results are discussed and interpreted. These results include several MDSets generated under different criteria to be used for determining the k-critical, the intermittent, and the redundant set proteins. In this research, the authors highlight the importance of what they call the ðk − 1Þ-critical set in the PPI network. In the following subsection, the Basic-MDSet model in [8] is discussed. Then, the proposed models are introduced in the remaining subsections. To express the algorithms proposed herein, the following notations are defined: � I n�n : the n-by-n identity matrix with ones on the main diagonal and zeros elsewhere. � J n�m : the matrix of ones, where all n-by-m entries are ones. � O n�m : the matrix of zeros, where all n-by-m entries are zeros. � A n�n : the adjacency matrix, where all n-by-m entries are binaries; 0 or 1. � XnY : the set of all elements belongs to vector X but not vector Y : � X [ Y : the union of vectors X and Y : � |X| : the size, number of ones, of the vector X: Wuchty [8] applied the ILP-based model [22] to find an optimal solution for the MDSet problem of PPI networks as follows: the solution of the problem in Equation (1) is a binary vector x, where x i ¼ 1 if protein i belongs to the generated MDSet and x i ¼ 0 otherwise. Algorithm 1 introduces a pseudo code that describes the steps used to translate the implementation of the proposed model in Equation (1).

| 2MD-MDSets model
Segre et al. [27] introduced the MOMA method that minimises flux distributions between mutant and wild-type fluxes. Moreover, Zhang et al. [13] reduced the difference (i.e., to increase the overlap) between the generated MDSets using different optimisation solvers. In contrast, the MOMA method is adopted here to design an ILP-model that can generate two MDSets simultaneously with the maximum number of different nodes between them. Specifically, two variables, x and y, can be used to represent two MDSets in the new system as follows: A ij x j ≥ 1; i ¼ 1; 2; :::n: A ij y j ≥ 1; i ¼ 1; 2; :::n: where n represents the number of nodes or proteins in the targeted network and jMDSetj is the size of the generated MDSet using the model in Equation (1). The intersection between the two MDSets x and y may represent the critical nodes in the graph. Therefore, the proposed model can quickly produce the critical set compared to the traditional method [37]. To adjust the model for linear programming techniques, z i is used as a new binary variable that satisfies the following two constraints: It is clear that z i ¼ 1 is the best value if x i and y i are different, and z i = 0 is the best value if x i and y i are similar. Equation (3) describes the final developed model to generate the two most different MDSets by a suitable ILP solver: A ij x j ≥ 1; i ¼ 1; 2; :::n: A ij y j ≥ 1; i ¼ 1; 2; :::n: Algorithm 2 introduces a pseudo code that describes the steps used to translate the proposed model in Equation (3).

| ITR-MDSets model
The proposed model, then, was further extended to generate multiple MDSets that cover all intermittent nodes in the PPI networks. The variable x in Equation (3) was treated as an input vector of binary values in which x i ¼ 1 if the node i belongs to any resultant MDSet and x i ¼ 0 otherwise. The obtained model can be expressed as in Equation (4). The implementation of the model can be iterated to generate a new MDSet. The value of x is updated in every iteration. This loop is stopped when there is no change in the vector x. As a result, the algorithm generates multiple MDSets with maximum differences between all of them.

| URD-MDSet model
Algorithm 4 describes the steps needed to generate the targeted MDSet by avoiding some specific nodes.

| DATA SETS
In this section, a set of PPI networks used through numerical experiments to reflect the efficiency of the proposed models is presented. Six data sets are used from the High-quality Interactomes (HINT) database version (3/10/2018), where these data sets have been collected from several interactome resources [38]. In addition, two data sets from the BioPlex (biophysical interactions of ORFeome-based complexes) network [39] were used.

| Human protein data sets
For human PPI networks, three different data sets obtained from H. sapiens in the HINT database (version 3/10/2018) [38] were considered. The first one of these data sets contains 63,684 high-quality binary protein (HHQBP) interactions between 12,815 human proteins. The second data set contains 116,456 high-quality co-complex protein (HHQCP) interactions between 12,352 human proteins. However, a network of 180,140 combined protein (HCP) interactions between 15,744 human proteins is considered as the third data set.

| Yeast protein data sets
Three different data sets of the yeast interacting [40] protein networks were considered. These data sets were obtained from S. cerevisiae in the HINT database (version 3/10/2018) [38].
The first data set under consideration contains 23,202 highquality binary protein (YHQBP) interactions between 5313 yeast proteins. The second data set contains 68,779 highquality co-complex protein (YHQCP) interactions between 5246 yeast proteins. The last data set consists of 91,981 combined protein (YCP) interactions between 5959 yeast proteins.

| Bioplex protein interaction network
Two versions of the protein interaction data set of the BioPlex network [39] were used. The first version, BIOPLEX1, had 23,744 proteins interactions between 7637 proteins, and the second, BIOPLEX2, had 56,553 protein interactions between 10,883 proteins. Moreover, these two data sets with 80,297 protein interactions between 11,540 proteins were also combined as (BIOPLEX12).

| Liver proteins data set
The 28,553 protein interactions between 7148 liver tissue proteins (LTP) collected in [11] were used.

| Enrichment analysis data sets
The following data sets for the biological functional enrichment analysis were used: � Essential genes (EGs) data sets: 1110 yeast essential genes and 2032 human essential genes from the DEG database, which collects data about essential genes from the literature, were utilized [41]. � kinase genes (KGs) data sets: 538 human kinases reported by Cheng et al. [40] and yeast 127 kinases from the Yeast Kinase Interaction Database were used [42]. � Transaction factors (TFs) data sets: 1214 human transaction factors reported by Vaquerizas et al. [43] and 268 yeast transcription factors from the YeastTract database were used [44]. � Drug-target genes (DGs) and pharmaceutics genes (PHGs) data sets: the DrugBank database was utilized to obtain 1214 and 568 genes for drug and pharmaceutics genes, respectively [45]. � Housekeeping genes (HKGs) data set: the Human Protein Atlas Database (available on the portal http://www.proteinatlas.org) was used to obtain 3804 housekeeping genes in the human network [9].

| RESULTS AND DISCUSSION
In this section, the implementation and performance of the proposed algorithms for data sets under consideration are discussed. All numerical results were implemented on a system with an Intel (R) Core (MT) i5 processor of 2.53 GHz and 4.0 GB Ram.

| Results of the Basic-MDSet model
The Basic-MDSet model in Equation (1) was applied on the human PPI networks from the HINT database version (3/10/ 2018) [38]; HHQBP, HHQCP, and HCP. Table 1 shows the 154results of this experiment compared with the results of Wuchty [8] for an old version of the HINT database. It was found that despite the current networks being larger than the previous networks by about 40%, the ratio of the MDSet's size (% MDSet) to the number of proteins in each network was less than 20%. In the HHQCP and HCP data sets, the ratio % MDSet was reduced to around 13% for the new version [35]. This result may be because of the increasing number of interactions. Table 2 shows the same results and analysis for the yeast, BioPlex and liver data sets, which was explained in Section 4. The results in Tables 1 and 2 indicate that the size of the MDSet is less than 20% of the number of proteins for all data sets, even with the increase in the number of proteins and interactions.

| Importance of the 2MD-MDSets model
To show the efficiency of the proposed 2MD-MDSets model in Equation (3), its results were compared with the Basic-MDSet model in Equation (1) Figure 4, drawn with the tool in [46], shows these results compared with the results of the 2MD-MDSets model in Equation (3) for the same data sets. The number of overlapped proteins using the proposed model reduced from 2144 to 1316 in HHQBP data sets and from 788 to 371 in YHQBP data sets. The 2MD-MDSets model was applied to several PPI networks, as given in Table 3. The proposed model could minimise the overlapping proteins, which may represent the critical set of each network.
To validate the results of the 2MD-MDSets model, the exact critical set of each network was evaluated with the traditional method [20], which can be concluded as follows:

F I G U R E 4
The overlaps between the MDSets generated by the MOSEK and GUROPI solvers compared with the overlaps between the MDSets generated by the 2MD-MDSets model for the HHQBP and YHQBP data sets. The tool in [46] has been used to draw this figure From Table 3, it can be concluded that the overlap of the resulting MDSets using the 2MD-MDSets model in Algorithm 2, which was called on only once, is almost equal to the size of the critical set evaluated by calling on the Basic-MDSet model hundreds/thousands of times for each data set. Moreover, it is expected that the extra proteins in the overlap between the resultant MDSets are important and may represent another important set in the PPI networks.
Ishitsuka et al. [47] used pre-processing steps, before calling on the algorithm, to identify some of the critical nodes based on the topological structure of the PPI network. Identifying this set of nodes and marking it as critical nodes helps reduce the number of solver calls. Moreover, they stated that their algorithm reduces the computational time by about 180 times compared to the traditional method of finding the critical set of PPI networks. In Table 3, the basic model takes ∼100 seconds to find an MDSet of 2398 proteins in the HHQBP network. Therefore, traditional methods [20] call on the solver 2398 times to find the critical set, which equates to ∼2398 � 100 ¼ 239; 800 seconds. However, one call of Algorithm 2 with the 2MDSet model only takes ∼1020 seconds. Therefore, the proposed 2MDSet model determines the critical set up to be 235 times faster than the traditional methods, even without any pre-processing steps.

| Interpretation of ITR-MDSets results
In this subsection, the focus is on the importance of the proposed ITR-MDSets model and the interpretation of its output, specifically finding the critical, intermittent, and redundant sets of the PPI network very quickly compared to traditional methods discussed in the previous subsection. The ITR-MDSets model starts by combining the two solutions, x and y; obtained from the 2MD-MDSets model as x ¼ x [ y: Then, the algorithm generates a new MDSet, y; using the model in Equation (4) where the differences between x (the input) and y (the output) are maximal. These two steps will be iterated until no new nodes could be added into x: Then, the algorithm returns k MDSets that will be used to find the critical, intermittent, and redundant sets according to steps explained in Section 3. Table 4 summarises the results of the ITR-MDSets model for the data sets under consideration. From Table 4, the critical set was evaluated very fast compared with the traditional method [20]. For example, the critical set of the HHQBP network is evaluated using only 13 iterations compared with 2398 iterations with the traditional method [20], as explained in the previous section.

| Usage of the URD-MDSet model
The URD-MDSet model in Equation (4) is designed to generate MDSets with the maximum or minimum number of specific nodes selected by the user. For example, this model can be used to maximise/minimise the number of essential genes in the resulted MDSet. Li et al. [35] discussed the need for the computational models to predict the essential genes from the biological network. Wuchty [8] and Zhang et al. [13] used different techniques to evaluate the MDSet and concluded that their solutions were enriched with several essential genes. However, the number of essential genes in these MDSets is unpredictable and varies according to the algorithm used. In this experiment, the URD-MDSet model will be used to increase the number of essential genes, and other important genes, in the resulting MDSets. Additionally, the proposed model can be used to answer the famous question "Is each MDSet enriched with essential genes?" In the literature, to answer this question, researchers used to randomly remove such proteins from the network and search for the MDSet for the modified network [8]. However, the proposed model can find the MDSet with the minimum number of essential genes in the network. Therefore, the proposed model can be used to answer this question efficiently and precisely. The URD-Model was applied to find the MDSet with the minimum number of EGs in HHQBP and YHQBP PPI networks.
The authors obtained MDSets with 325 and 129 genes from the total EGs of 2032 and 1110 in HHQBP and YHQBP data sets, respectively. These MDSets are unenriched with EGs as will be explained in the next subsection. The cell needs all the essential genes [41,48], kinase genes [40,42], and transcription factor proteins [43] in signal transaction pathways. In this experiment, the URD-MDSets model was constrained to maximise the number of EGs, KGs, TFs, DGs, and/or PHGs [45]. The results of the experiment are shown in Table 5. These results prove that the generated MDSet can be constrained as desired, rather than maximising the number of nodes with specific features, such as degree number [13]. Table 5 shows the time and cost to increase the essential genes in MDSet. This time consists of the execution time of the algorithm plus the time required to manually define the vector, x: The results showed that the proposed model significantly increased the number of EGs in the resulting MDSet compared to the number of EGs in MDSets generated by the basic model in Equation (1)

| Functional enrichment analysis
For the enrichment analysis for the resulting MDSets, Fisher exact test in R language was used [49]. In this test, the size of the PPI network and the size of the resulting MDSet were input along with the number of the important genes under consideration (ESs or KGs, etc.) and the number of these genes in the resulting MDSet. The output of the test is a pvalue, where p − value < 0:05 means that the MDSet is enriched with the important genes. Table 6 shows the results of the EGs' enrichment analysis for the first five MDSets generated by the ITR-MDSets model for the HHQBP and YHQBP data sets. Moreover, for each data set, the URD-MDSets is used to generate one MDSet with the minimum number of EGs and one MDSet with the maximum number of EGs. Table 7 shows the enrichment analysis for all MDSets, generated by the URD-Model in Table 5. Although most of the MDSets are enriched with EGs, the PPI network may contain unenriched MDSets.
To verify that each MDSet has different biological functions, the DAVID tool [50] was used to annotate four MDSets: "Mosek", the MDSet for the basic model, "Min EGS", the MDSet with min number of essential genes (in Table 6), and "MDSet1" and "MDSet2" generated by the 2MD-MDSets model (in Figure 4). Only functional categories with UP_KEYWORDS and p − value < 0:05 (EASE score <0.05) were used. It was found that the majority of biological function categories are shared among these MDSets as in Figure 5(a). Additionally, three MDSets were found, each with some unique functional categories, in metabolism, RNA processing, translational regulations [ Figure 5(b)]. For example, MDSet2 is enriched with diabetes mellitus with p − value < 0:0063. Moreover, it was found that each set has different processes in metabolism, RNA processing, translation regulations. The critical set had a great interest in controllability [47]. Wuchty et al. [12] found that in PPI networks, the k-critical proteins (kinase and transcription factors) play an important role in phosphorylation and regulatory events in their interactions. Despite this, important genes that do not present in the k-critical set may be neglected, so the concept of criticalness was extended to other degrees of criticalness. Figures 6(a, b) show that there is a set of proteins that appears in ðk − 1Þ MDSets for each data set. Due to the large number of EGs in this set compared to the k-critical set, this set is called the ðk − 1Þ-critical set. Table 6 shows some MDSets generated by the ITR-MDSets and the URD-MDSets models for the HHQBP and YHQBP data sets along with the number of EGs in kand ðk − 1Þ-critical sets. To show the biological function of the ðk − 1Þ-critical proteins, the number of EGs, KGs, and TFs proteins involved in the ðk − 1Þcritical set were counted (Table 8). It was found that the ðk − 1Þ-critical set is as important as the k-critical set. Thus, the authors believe that the ðk − 1Þ-critical set analysis is as important as the critical set analysis and can be used for other networks or graph types.

| Comparison with community detection methods
To validate types of proteins in the generated critical sets, they were compared with subnetworks extracted by the HotNet2 Algorithm [51]. HotNet2 integrated the PPI network with mutation information for 11,500 proteins in 12 cancer types from the TCGA project. The authors identified and annotated 15 significantly mutated subnetworks (Supplementary Table 5 in HotNet2 [51]). Figure 7 shows a comparison of the reported sets with HotNet2 subnetworks proteins and the basic model solution in Equation (1) using the Mosek solver, in the last column. It is noticeable that each protein present in the Mosek solution exists in one of the authors' critical sets. Nevertheless, several proteins appeared in the authors' critical sets but not in the Mosek solution. Additionally, some proteins in the subnetworks are not in existence at the HHQBP PPI network.
Each subnetwork in HotNet2 should contain one or more proteins that can dominate the other proteins in the subnetwork. The TP53 subnetwork has the highest covering score of 68% in HotNet2 and contains 45 subunits (Supplementary Table 8 in HotNet2 [51]). There are 18 proteins from this subnetwork in the authors' critical set and there are 23 proteins in the redundant set (the first column in red). Moreover, the authors found five proteins in the ðk − 1Þ-critical set and four proteins in the k-critical set. PTEN protein (the second mutated protein in TP53 subnetwork) was reported in the ðk − 12Þ-critical set, which means that this protein is present in a small number of MDSets. Moreover, PTEN protein was not found in the MDSet generated by Mosek. The second important subnetwork is the PI3K/RAS subnetwork (with a covering score of 20%), where PIK3CA and KRAS proteins were present in the kand ðk − 1Þ-critical sets. The third subnetwork is the NOTCH1 subnetwork (with a covering score of 33%), where three proteins were present in the ðk − 1Þ-critical set, and two proteins were present in the ðk − 12Þ-critical set. Additionally, it was found that the Cohesin complex subnetwork has many proteins in the ðk − 12Þ-and ðk − 13Þ-critical sets. Finally, the BAP1, condensing and MHC class I subnetworks are dominated by proteins in the k-critical set.
From the covering concept of MDSets, proteins in the redundant set mean that these proteins do not belong to any generated MDSets. Figure 7 shows that the SWI/SNF and ASCOM subnetworks have only proteins in the redundant set, while no proteins in the authors' MDSets can dominate these complexes. Therefore, the authors extended their search to discover which proteins can dominate these complexes. They found that the protein SMARCD1 dominates the SWI/SNF complex, where the SMARCD1 protein was reported as a subunit in the SWI/SNF subnetwork. Additionally, they found F I G U R E 6 The multiple MDSets in HHQBP and YHQBP PPI networks using the ITR-MDSets model. (a, b) Venn diagrams that show the overlap between the first five MDSets of each data set. (c) Bathtub curve that represents the trend (size and criticalness degree) of the grouped critical sets extracted from the generated MDSets using MOIA. The tool in [46] was used to draw figures (a) and (b)