Uncovering domain motif interactions using high‐throughput protein–protein interaction detection methods

Protein–protein interactions (PPIs) are often mediated by short linear motifs (SLiMs) in one protein and domain in another, known as domain–motif interactions (DMIs). During the past decade, SLiMs have been studied to find their role in cellular functions such as post‐translational modifications, regulatory processes, protein scaffolding, cell cycle progression, cell adhesion, cell signalling and substrate selection for proteasomal degradation. This review provides a comprehensive overview of the current PPI detection techniques and resources, focusing on their relevance to capturing interactions mediated by SLiMs. We also address the challenges associated with capturing DMIs. Moreover, a case study analysing the BioGrid database as a source of DMI prediction revealed significant known DMI enrichment in different PPI detection methods. Overall, it can be said that current high‐throughput PPI detection methods can be a reliable source for predicting DMIs.

Proteins interact with their partners through different modules mediating different cell functions [1,2].The two main modules are known as Short Linear Motifs (SLiMs), which mediate domain-motif interactions (DMIs), and globular domains, which are engaged in domain-domain interactions (DDIs) and DMIs.In addition to these modules, other critical features include coiled-coil domains involved in protein-protein interactions (PPIs), transmembrane domains facilitating membrane associations, and proteins that fold upon binding.These modules collectively contribute to the diverse repertoire of protein interaction mechanisms, which is essential in controlling a wide range of cellular mechanisms within organisms [3,4].While many proteins require a well-defined structure for their functionality, a significant portion of an organism's proteome comprises polypeptide segments that are unlikely to adopt a defined three-dimensional structure yet remain functional.These segments are termed intrinsically disordered regions (IDRs) [5].Because IDRs typically lack bulky hydrophobic amino acids, they are unable to form the well-organised hydrophobic core characteristic of structured domains.Consequently, their functionality differs from the classical structure-function paradigm associated with globular, structured proteins [6,7].Moreover, proteins, thriving in a cellular milieu with high concentrations, manifest diverse states extending beyond the commonly acknowledged native and amyloid configurations [8].The discovery of proteins undergoing liquid-liquid phase separation is rapidly growing.Given the likelihood that, at the high concentrations within cells, most proteins would adopt a liquid condensed state, this state should be recognised as a fundamental protein state, alongside the native and amyloid states [9].For example, RNA-binding proteins like TDP43 and FUS undergo liquid-like phase separation, forming droplets where RNA concentration is high in the nucleus.In the cytoplasm, characterised by low RNA concentration, these proteins give rise to solid-like pathological condensates [10].SLiMs are recurrent linear peptide microdomains of 2-15 consecutive amino acid residues, primarily found within IDRs [3,11,12,13].According to an estimate, there are approximately 100 000 motifs involved in protein binding, most of which are found in IDRs.The flexibility of SLiMs allows them to function in disordered and structured regions, contributing to the diverse mechanisms by which proteins interact and carry out specific functions within cells [14][15][16].Alternative exons exhibit a notable enrichment of IDRs, showing their importance in functional diversity [17,18].Interactions facilitated by SLiMs are transient and exhibit low affinity, typically within 1-150 lM [19,20].It is important to note that SLiMs usually comprise only 2-5 precisely defined positions, posing a challenge for their identification using experimental and computational approaches [21].SLiMs mediate various cellular processes through PPIs, specifically DMIs, where rapid response is necessary for transmissions [22,23].SLiMs serve as intriguing molecular switches within proteins, orchestrating rapid and reversible transitions between different functional states.These compact and evolutionarily conserved peptide sequences act like regulatory modules, enabling proteins to respond dynamically to cellular signals or environmental changes.With the introduction of a single mutation, these SLiMs can induce substantial alterations in the protein's behaviour, allowing it to toggle between distinct functionalities.This inherent flexibility underscores the importance of SLiMs as critical elements in cellular control mechanisms, contributing to the dynamic regulation of diverse biological processes.However, it is essential to note that the effectiveness of SLiMs as molecular switches can vary depending on the specific context, the nature of the protein, and the type of mutation introduced.Some SLiMs may be highly sensitive to single mutations, while others might require more complex alterations.Understanding the nuanced roles of SLiMs as molecular switches provides valuable insights into the intricacies of protein regulation and cellular signalling pathways [24].The presence of motif residues interacting with the domain suggests that these positions of residues are likely to undergo evolutionary conservation.Notably, a substantial portion of SLiMs includes two or more conserved hydrophobic residues, as exemplified by the nuclear export sequence (NES), which contains four residues [25], and a single mutation of TQG to TQT can result in synaptic transport in the neuronal cells [26].The small size and evolutionary plasticity of SLiMs suggest that these linear motifs are inclined to arise independently, facilitating the discovery of new motifs that share interaction partners [3,21].The impact of SLiMs on protein functionality was first suggested in the 1970s and confirmed in 1990 when the KDEL motif was studied in conjunction with the ERD2 receptor.The identification and experimental validation of these motifs remain challenging.Additionally, the proportion of validated motifs compared to the total number is likely to be relatively small [27].Different new computational tools and methods have been developed to ease the process of SLiM prediction from the protein sequence data.The main repositories maintaining motif data include eukaryotic linear motif database (ELM database) [28], PROSITE [29], Linear Motif mediated Protein Interaction Database (LMPID) [30], Minimotif-Miner [31], PepCyber [32] and Scansite [33].During recent years, SLiMs have gained popularity because of their key residues that have shown involvement in subcellular localisation, post-translational modifications (PTMs), regulatory functions, protein trafficking, signal transduction, controlling cell cycle, and stabilising scaffolding process [34,35].

SLiM discovery/prediction tools
The potential for false-positive outcomes has always made the development of new bioinformatics techniques for SLiM prediction difficult [36].Significant advancements have been made in motif discovery in recent years, and various new computational methods have been devised to discover SLiMs, but most of the SLiM prediction tools rely on sequence information and known data (Table 1).Nowadays, structure-based methods are also gaining attention as these can play a crucial role in predicting SLiM-based interactions by leveraging information about the three-dimensional arrangement of proteins.These tools utilise experimental or predicted structures of protein complexes to analyse the spatial orientation and interactions of SLiMs within the binding interfaces.By considering the structural context, these tools can provide insights into the specificity and affinity of SLiM interactions, aiding in the identification of potential binding partners.Additionally, structure-based approaches contribute to understanding the dynamic nature of SLiM-mediated interactions, capturing conformational changes and flexibility in the binding sites.To date, several methods including iELM server [37], and MoRFchibi SYSTEM [38] have been developed to utilise structural information to accurately predict DMIs.Moreover, Artificial Intelligence (AI)-driven prediction tools, utilising machine learning (ML) algorithms, analyse PPI datasets to decode complex patterns associated with SLiMs.These tools can minimise false positives, a common limitation in traditional bioinformatic approaches, thereby enhancing prediction specificity.
The integration of AI can contribute to the identification of novel SLiMs and refines our understanding of their functional relevance.One recent advancement, the MotSASi method, enhanced the prediction of authentic functional SLiMs.MotSASi integrates sequence variant information and structural analysis of the energetic impact of single amino acid substitutions (SAS) in SLiM-Receptor complex structures.This involves constructing a SAS tolerance matrix, indicating the tolerance of each position to one of the 19 possible SAS.Focusing on three SLiMs related to intracellular protein trafficking (phospho , MotSASi demonstrated that the inclusion of variant and structure information improves the prediction of authentic SLiMs, diminishes false positives, and enhances the categorisation of variants within SLiMs.Therefore, it's important to develop new computational methods that can utilise sequence, structure, and variant information [39].

SLiM-mediated protein-protein interactions
Protein interactions commonly involve globular domains, known as DDIs, where one protein's domain interacts with that of another.These interactions frequently entail substantial surface contacts between the interacting protein domains.[19,81].Another type of interaction involves DMIs, facilitated by SLiMs, which constitute a distinct subset of PPIs.Due to their short length, SLiMs are well-suited for mediating processes that demand quick reactions [19].SLiMs are particularly pivotal for PTMs, a specialised subset of transient PPIs, by serving as specific target sites.Research by Neduva et al. suggests that a significant portion, ranging from 15% to 40% of protein interactions, are based on SLiMs.However, it is worth noting that only a fraction of such interactions have been identified to date [82].The ELM database categorises SLiMs into six main classes, each serving distinct functions.These classes include cleavage sites (CLV), ligand binding sites (LIG), subcellular targeting sites (TRG), sites susceptible to post-translational modifications (MOD), degradation sites (DEG), and docking sites (DOC) [83,84].Merely a small portion (2-3 a.a.) of the SLiM residues are discovered to be involved in binding interactions or cellular functioning [13].A study claims that these SLiMs can mediate protein binding.For instance, the plant plasma membrane H+-ATPase Cterminal phosphorylated peptide interacts with the 14-3-3 regulatory complex [85].

Public protein-protein interaction repositories
The correct functioning of cells depends on interactions between proteins (PPIs).The precise number of PPIs is still unknown, while estimates place it between 130 000 and 650 000 [86].A more thorough understanding of the protein network is made possible by collecting these PPIs in specialised databases.The Database of Interacting Proteins (DIP) was the first database designed to preserve PPI data [87].Public repositories of PPIs are now expanding quickly and assist in finding new SLiMs and storing PPI data.There are already several PPI repositories available, helping in studying protein networks, as shown in Table 2.

High-throughput PPI detection methods and DMIs
As we explore the intricate world of cellular organisation and activities, it is clear that protein interactions play a vital role as the dynamic elements.[110][111][112].In recent years, there has been a collective effort within the SLiM field to devise largescale approaches tailored for the comprehensive characterisation of SLiMs (Table 3) [109].In a recent study, researchers compared the efficacy of biotinylated peptide pulldown and the protein interaction screen on a peptide matrix (PRISMA) coupled with mass spectrometry.Through testing eight distinct peptide sequences with varying affinities for three specific protein domains (KEAP1 Kelch, MDM2 SWIB, and TSG101 UEV), the study revealed that biotin-peptide pulldown outperformed PRISMA in validating SLiMs.Notably, tandem peptide repeats enhanced interaction capture, underscoring the need to consider method development parameters for effective affinity capture MS-based validation of SLiM-based interactions from cell lysates [20].Moreover, a new method, thermal proximity coaggregation (TPCA), has been introduced to monitor the dynamics of native protein complexes in living cells.TPCA leverages the coaggregation of proteins within a complex during heat denaturation, employing a cellular shift assay to produce melting curves for numerous proteins.Validated through the detection of known protein complexes, TPCA unveiled cell-specific interactions across diverse cell lines.This high-throughput and system-wide approach to studying protein complex dynamics demonstrates its promise in identifying complexes influenced by diseases [113].High-throughput PPI data availability has made it possible to construct cutting-edge computational methods for SLiM predictions.Although these tools are a huge advancement, it is essential to recognise their inherent limitations because they might occasionally yield false positives.One potential solution to this problem is to apply gene ontology (GO), gene expression profiles, and high-throughput data to the computational algorithms used to improve their accuracy and consistency.This comprehensive method can improve our understanding of SLiMs and their function in forming the complex web of cellular interactions [111].
In this review, we will discuss some common highthroughput methods to identify PPIs and SLiMmediated interactions.

Yeast two-hybrid
The yeast two-hybrid (Y2H) technique stands out as a robust method for identifying PPIs within yeast cells.Designating interacting proteins as "bait" and "prey," this approach activates reporter genes upon interaction, leading to distinct colour reactions or growth on specific media.Y2H has proven effective for exploring genome-wide interactions in diverse organisms, including Saccharomyces cerevisiae, bacteriophage T7, humans, and Caenorhabditis elegans [114,115].Serving as a powerful tool in systems biology, it facilitates the investigation of large interactomes and enhances our understanding of disease aetiology by examining protein interactions within a system [116].Y2H offers two primary screening methods: the array method and the library method.The library approach involves searching for pairwise interactions between proteins of interest (bait and prey) within cDNA libraries.However, it is susceptible to false positives, where proteins are incorrectly identified.Moreover, interaction partner identification necessitates colony PCR and sequencing methods, rendering this approach expensive and time-consuming.
Conversely, the array strategy, also known as the matrix approach, entails the direct mating of a pool of baits and a pool of prey with various yeast mating types to identify interactions.Although this method is automated, facilitating the exploration of genome-wide interactomes, it may overlook certain interactions, termed false negatives [117,118] (Fig. 1A).Hold-up assay Hold-up assays involve immobilising peptides on beads, distributing them in a filter plate, allowing proteins to bind at equilibrium, and collecting unbound proteins through filtration

Determining affinities
Peptide synthesis required for using a peptide bait.Production of bait protein needed when using purified proteins Numerous detection options.Versatile, accommodating both peptides and proteins as bait Constrained by the costs associated with peptide synthesis and detection its effectiveness in unbiased and efficient domain mapping for diverse protein interactions [119].

Affinity purification coupled mass spectrometry
Affinity purification coupled mass spectrometry (AP-MS) stands as a formidable technology in systems biology, transforming the landscape of PPI discovery.This method harnesses mass spectrometry (MS) to enable biochemical techniques, such as affinity purification and chemical cross-linking, providing invaluable insights into proteome-wide interactions across diverse biological systems [120].In the AP-MS workflow, the protein of interest is genetically fused to a specific tag, allowing for its identification through an affinity column designed for tag association or a specialised antibody [94].The flexibility of the AP-MS methodology accommodates different approaches, ranging from a single-step purification using tags like the Flag tag to the more intricate two-step purification method, often preferred for its efficacy.This method involves dual tagging of proteins at the C-or N-terminal ends or with tags such as 6xHis-and Strep-tags, and the use of tandem affinity purification (TAP), featuring a cleavage site between tags, which facilitates the isolation of multiprotein complexes containing the tagged protein.
Subsequent MS analysis is employed to identify the constituents of these complexes, making the two-step purification approach renowned for its sensitivity and specificity, rendering it an excellent choice for in-depth PPI investigations [121,122].However, this method is not without its challenges.The two-step purification approach, while effective, introduces additional complexity to the experimental workflow, requiring careful planning and execution.The resource-intensive nature of AP-MS, demanding specialised equipment and expertise for optimal results, may limit its accessibility to some research settings.Furthermore, like any technique, AP-MS may introduce potential artefacts or biases, necessitating rigorous validation steps to ensure the reliability of the obtained results.AP-MS is proficient at isolating and studying stable protein complexes, rendering it suitable for investigating enduring interactions within tissues.Nonetheless, its effectiveness in probing dynamic assemblies is contingent upon the stability of the interactions involved.In instances where interactions are transient or context-dependent, AP-MS may encounter limitations in efficiently capturing these fleeting and context-specific events.Therefore, researchers should carefully design experiments to address the dynamics of the protein interactions they aim to study [114] (Fig. 1B).

Co-fractionation coupled mass spectrometry
Co-fractionation coupled mass spectrometry (CoFrac-MS) has emerged as a potent strategy for in-depth exploration of intricate protein interactions, encompassing both direct and indirect relationships [71].In this methodology, protein extracts undergo a sophisticated fractionation process, often employing biochemical techniques such as size exclusion chromatography, followed by meticulous evaluation of the resulting fractions using mass spectrometry.Like AP-MS, CoFrac-MS has the capacity to unveil proteome-wide connections, providing a comprehensive view of cellular relationships [123].The strength of CoFrac-MS lies in its ability to elucidate complex protein associations.It enables the differentiation between direct and indirect interactions among protein pairs, shedding light on true physical connections and those facilitated by intermediary partners.However, navigating this complexity presents a substantial challenge, particularly in distinguishing between the various types of interactions within intricate mixtures.While CoFrac-MS serves as a robust tool for identifying complex protein networks, interpreting the nature of these associations necessitates meticulous data analysis and integration with additional contextual information.Just like AP-MS, the effectiveness of CoFrac-MS in capturing dynamic events is influenced by interaction stability.In cases of transient or context-dependent interactions, CoFrac-MS effectiveness in capturing dynamic assemblies may be limited.Therefore, careful consideration of interaction characteristics is crucial when utilising CoFrac-MS for studying tissue-specific or dynamic protein assemblies.CoFrac-MS, on the positive side, offers a holistic view of protein interactions, providing valuable insights into the intricacies of cellular relationships.However, the interpretative challenge in distinguishing between direct and indirect interactions requires sophisticated computational tools and careful consideration of experimental conditions.Additionally, like other mass spectrometry-based techniques, CoFrac-MS demands specialised equipment and expertise, potentially limiting its accessibility in certain research settings [124] (Fig. 1C).Five primary types of high-throughput experiments are now available to identify SLiM-mediated interactions (Table 3).

A case study to assess enrichment of DMIs in PPIs available in BioGrid database
To determine the viability of utilising data from public repositories for predicting DMIs, we evaluated their potential to capture interactions mediated by SLiMs effectively.Here, we have analysed human interactome retrieved from BioGrid 4.4.2[retrieved: 2023-07-01] [89] database, known to be the most comprehensive PPI database.Most of the PPI repositories curate the same information from experimental evidence or literature.The quality of the interaction data has always been questioned.In one study, it was seen that despite curating the same data, there was a significant difference between different databases, raising questions about the interaction data's quality and reliability [95].Therefore, to see if the PPI data stored in the BioGrid database is useful, we mapped proteins involved in PPIs to their protein sequences, including splice variants.The total number of human PPIs analysed was 302 854, involving 12 053 unique proteins.The protein names retrieved from the BioGrid database were mapped to their sequence data using Uniprot, and analysis was restricted to reviewed proteins only.11 942 out of 12 053 proteins were successfully mapped to their protein sequences, which means most of the PPIs stored in BioGrid have their sequence data available.PPI data was then used to identify known SLiM-mediated interactions using the ELMi method of SLiMEnrich v1.5.1 [36], which works based on known interactions in the ELM database.BioGrid currently stores data generated by different techniques; we divided the dataset into four broad categories based on the Molecular Interactions (MI) ontologies [125].These subsets of data were then used to see the enrichment of DMIs.Protein complementation assay and structure-based methods showed promising results with the highest enrichment scores [False Discovery Rate (FDR) < 0.05].All PPI techniques, taken together, captured DMIs and demonstrated a noteworthy enrichment of DMIs as compared to random protein pairs.This was also observed in our recent study where different high-throughput methods showed significant enrichment [126] (Table 4).

Current challenges and outstanding questions
Protein-protein interaction detection techniques are powerful sources of identifying DMIs, but researchers often face several challenges and limitations.In this review, we discussed several challenges associated with capturing DMIs using PPI databases and detection methods.As we know, several PPI repositories store data related to different species.The main challenge is the quality of the PPI in these repositories, as they are typically inconsistent, and there might be interactions with high false-positive rates.This, in turn, can lead to false and inaccurate DMI predictions.Therefore, it is essential to assess the quality of the PPI in available repositories to ensure their reliability for DMI analysis [127].PPI databases often lack 3D structural information on the interactions, which can also influence understanding DMIs and insights into biological mechanisms [128].Furthermore, the conundrum of understudied proteins, whose biological functions remain enigmatic, poses a significant obstacle to effective research.The prevailing 'streetlight effect' in scientific inquiry tends to concentrate efforts on well-explored proteins, thereby restricting the exploration of potentially pivotal yet neglected ones.For instance, extensively studied proteins like the tumour suppressor p53 exemplify this bias.To overcome this challenge, community engagement is imperative through a survey aimed at identifying understudied proteins and determining the essential information required for in-depth mechanistic investigations.Such community-driven initiatives hold the promise of mitigating biases, fostering collaborative endeavours, and methodically advancing the molecular characterisation of understudied proteins [129,130].Another big challenge is the possibility of significant noise and false positives in PPIs detected by high-throughput methods, making it difficult to distinguish true interactions from artefacts [131].More importantly, integrating PPIs from different resources can be quite challenging due to differences in data formats, identifiers and curation methods [126,131,132].Few databases have PPI interactions that are not experimentally validated, leading to uncertainty about their biological relevance.Relying on such PPI interactions can lead to invalid results and false-positive DMIs [133].Moreover, PPI data sometimes do not account for PTMs, which can significantly impact the identification of DMIs.Therefore, it is important to account for PTMs while uncovering DMIs [99].Developing more comprehensive protein networks requires better integration methods/strategies to integrate data from different PPI resources efficiently.Cross-species analysis also needs to focus on finding highly conserved DMIs involved in different diseases.AI (e.g., machine learning) can offer a comprehensive assessment of SLiM interactions by leveraging sequence variant and structural information, providing an efficient approach in exploring this critical aspect of molecular biology.This emphasises the need to develop new machine learningbased computational methods to predict DMIs from PPIs in combination with structural information accurately.In short, a critical approach to PPI data analysis, including cautious data selection, integration, validation, and knowledge of the context and study objectives, is necessary to overcome these challenges.

Conclusion
The creation of profuse PPI datasets has become routine in the era of high-throughput experimental techniques.However, the importance of carefully evaluating the quality of detected PPIs as a valuable source of DMIs has become apparent due to the inherent error-proneness of high-throughput methods.The primary aim of this review is to serve as a resource for PPI detection methods, repositories, and SLiM prediction tools.In addition, we evaluated PPIs obtained from several high-throughput methods, which are indexed in BioGrid, to determine DMIs.Our analysis found a significant enrichment for all the approaches we looked at showing promise in identifying DMIs.
mRNA display mRNA display involves screening peptide-RNA fusions in a cell-free format, wherein in vitro translated peptides are covalently linked to their encoding mRNA via a puromycin linkage, facilitating selections against an immobilised bait followed by reverse transcription and sequencing Enables the creation of extensive peptide libraries (10 13 ) The size of the library is constrained by the expense of the designed oligonucleotide library Enriching binders necessitates multiple rounds of selections In cell binding assays Pheromone signalling competitive growth assay A yeast pheromone signalling-based assay explores the specificity of LP motif docking interactions with yeast G1 cyclin Cln2, utilising deep mutational scanning Limited by the transduction of cells Generates a list of peptides ranked by affinity Next-generation sequencing (NGS) incurs associated costs MAPK competitive growth assay A yeast growth assay utilising exogenous mitogenactivated protein kinase (MAPK) signalling components to investigate MAPK docking motifs Occurs in physiological conditions Intricate biases may be introduced Generates a list of peptides ranked by affinity Constrained to the studied MAPK kinase Restricted by cell transduction during library construction Functional assays Degradation assays Large-scale functional assays can provide novel insights into degradation motifs, or degrons, a class of SLiMs that promote protein degradation Provides functional insights Lacks details on the binding partner Offers a semi-quantitative readout Constrained by cell transduction during library construction and/or FACS sorting Transactivation assays Functional studies on Transactivation Domains (TADs), a type of SLiM interacting with transcriptional coregulators to regulate transcription, often employ these assays measuring cell viability or fluorescent reporter production as indicators of transcriptional activity Provides functional insights Lacks details on the binding partner Offers a semi-quantitative readout Constrained by cell transduction during library construction and/or FACS sorting Quantitative binding assays MRBLE-pep MRBLE-pep assays use spectrally encoded lanthanide beads with synthesised peptides for medium-throughput quantitative measurement of protein-peptide interactions Simultaneous determination of multiple affinities Constrained by the synthesis of peptide-linked beads Necessitates specialised equipment

Fig. 1 .
Fig.1.High-throughput PPI detection methods.(A) The yeast two-hybrid (Y2H) system needs yeast's transcription factor activation, DNA binding domains (DBD), and activation domains (AD).A prey (possible interacting protein) is fused with the AD, and a bait (protein of interest) is fused with the DBD.The bait protein binds with a binding site in the reporter gene's promoter region after fusing with the DBD.When the prey protein fuses with the AD, it then binds to the bait protein to initiate the expression of the gene.(B) In Affinity purification coupled mass spectrometry (AP-MS) a tag protein is attached to the protein of interest (bait).Co-purified proteins are those that bind to the tagged protein.Mass spectrometry (MS) is then used to identify these proteins.(C) In co-fractionation followed by mass spectrometry (CoFrac-MS), protein extract undergoes thorough fractionation to separate protein complexes, subsequently identified by MS (created using BioRender.com).

Table 3 .
A comparative summary of high-throughput methods used to capture SLiM-mediated interactions.
FEBS Letters 598 (2024) 725-742 ª 2024 The Authors.FEBS Letters published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.

Table 4 .
PPI detection methods retrieved from the BioGrid database.Number of symmetric and non-redundant PPIs with protein pairs examined by Uniprot; b Known SLiM-Protein interactions from the ELM database; c Observed enrichment of known DMIs captured from PPIs; d False Discovery Rate (FDR) < 0.01; e FDR < 0.05.