Large‐scale survey and database of high affinity ligands for peptide recognition modules

Abstract Many proteins involved in signal transduction contain peptide recognition modules (PRMs) that recognize short linear motifs (SLiMs) within their interaction partners. Here, we used large‐scale peptide‐phage display methods to derive optimal ligands for 163 unique PRMs representing 79 distinct structural families. We combined the new data with previous data that we collected for the large SH3, PDZ, and WW domain families to assemble a database containing 7,984 unique peptide ligands for 500 PRMs representing 82 structural families. For 74 PRMs, we acquired enough new data to map the specificity profiles in detail and derived position weight matrices and binding specificity logos based on multiple peptide ligands. These analyses showed that optimal peptide ligands resembled peptides observed in existing structures of PRM‐ligand complexes, indicating that a large majority of the phage‐derived peptides are likely to target natural peptide‐binding sites and could thus act as inhibitors of natural protein–protein interactions. The complete dataset has been assembled in an online database (http://www.prm‐db.org) that will enable many structural, functional, and biological studies of PRMs and SLiMs.


21st Nov 2019 1st Editorial Decision
Thank you again for submit ting your work to Molecular Syst ems Biology. We have now heard back from the three referees who agreed to evaluat e your st udy. As you will see below, the reviewers acknowledge that the st udy present s a relevant resource. They raise however a series of concerns, which we would ask you to address in a major revision.
Wit hout repeat ing all the point s list ed below, one of the more fundament al issues is raised by Reviewers #1 and #2, who point out that despit e it s pot ent ial resource value the st udy remains somewhat thin in terms of providing new biological insight s. We think that expanding the st udy by following up on one of the suggest ions of reviewer #1 and the recommendat ions of reviewer #2 would significant ly enhance the impact of the st udy.
All ot her issues raised by the reviewers would need to be convincingly addressed. Please feel free to cont act me in case you would like to discuss in furt her det ail any of the issues raised. On a more edit orial level, we would ask you to address the following issues. In this manuscript Teyra, Kelil and colleagues have used an est ablished phage-display approach to det ermine pept ide binders for 163 globular domains (pept ide recognit ion modules, PRMs). In opposit ion to previous relat ed st udies that have focused on profiling different inst ances of specific domain families, this work tried to survey a diversit y of different families, having det ermined at least one binding pept ide for 79 different domain families. Out of the 163 domains, 74 had at least 5 pept ides from which a specificit y model could be derived. The aut hors briefly compared some of the propert ies (lengt h, const rained posit ions) of these specificit y profiles wit h those found for specific pept ide binding domain families (SH3, PDZ, WW). They then mat ched the profiled domains wit h appropriat e st ruct ural models, showing that the const rained posit ions were oft en in-line wit h the pept ide bound in the domain-ligand complex. The collect ed informat ion has been made available in an easy to use dat abase. St udying liner-mot if int eract ions remains a very difficult challenge and I think this work and the accompanying dat abase serves as a fant ast ic resource for a number of pot ent ial fut ure st udies. However, the aut hors themselves have not really derived new knowledge from this resource, besides the det erminat ion of the binding pept ides and specificit y models. Providing some example applicat ion(s) would st rengt hen this manuscript considerably.
Major concerns My single biggest concerns is that there is really lit tle in this work in terms of novel biological findings. Previous related works have tried to study, for example, the evolution of peptide binding interactions, or the extent by which the same domains may have more than one binding mode, or trying to make concrete predictions for in-vivo targets of peptide binding domains. I understand that a lot of work has gone into obtaining the domain binding peptides but this manuscript would be considerably stronger if the authors then used the data for some application akin to those prior studies. To be constructive I provide here some suggestions of potential applications. However, I don't mean that the authors should do all of these things or even any of these things, just that I suggest the authors should showcase how their new data can be used to derive new biological findings. Suggested possible applications could include: -One of the most striking aspects of this resource is that the authors have covered many domain families and have structural models for a very large fraction of them. This aspect is not really explored. Looking at the beautiful Figure 4, it seems clear that there is a large diversity of folds and no immediate patterns relating the fold and the motifs appears but this relation between fold, pocket and binding peptides could be quantified. Is there a relation between the properties of the fold and the types of sequences it binds ? Any relation between the residues near the peptide domains and the amino-acids in the binding peptides ? Are the domain residues in contact with target peptide residues more likely to be conserved ? Are the domain-peptide specific residueresidue contacts for constrained peptide positions more likely to be important for binding ? -In Figure 4, some of the specificity models for some domains (e.g. HSPA9, SND1, WDR74, NXF1, others) extend beyond the size of the structural ligands. Some of these extensions still contain positions that are apparently constrained. Can the authors use the structural models to rationalize these extended specificities ? For some of these examples, the extended binding mode may already be described in the literature but any new examples could be interesting.
-Perhaps a low hanging fruit application would to suggest likely in-vivo protein binding partners and the binding sites for these domains. The authors have already started to do some of this work by predicting likely biding sites within disordered regions. These predictions could be overlaid on an upto-date compilation of human interaction data and indicating potential binding site regions that could contribute to the protein-protein interactions. Providing some examples from this could help others understand how to make use of this resource. A larger extension of this could be to map and study human genetic variation on to these potential binding sites.
Minor concerns -In Figure 4, the authors note that there are residues that are modified by phosphorylation and other PTMs. However, this is not really discussed in the results. I found it interesting that these positions provided clear mismatches between the phage-display selected amino-acids and the amino-acid in the structural ligand. For phosphorylation these are the expected S/T to phosphomimetic D/E mismatches but I am not familiar with the idea that tryptophan can mimic methylated residues. This is most apparent for the TUDOR domain of SND1 with a strong selection for tryptophan at the position selective for methylated arginine. Is this well known ? It would be worth having a short section describing in more detail the results for the PTM binding domains, ideally looking in detail at the structural reasons for some of the differences. Even within the phosphorylation examples there are interesting differences that are worth going into more detail. For example, the two first 14-3-3 domains in Figure 4 have a phospho-threonine and phosphoserine in the structural peptide and they select different mimetic residues and the two phosphotyrosine bound domains (DOK4 and SHC1) don't really appear to select for phosphomimetic residues at those positions.
- Figure 4 is beautiful but maybe too big for a paper. The authors could consider splitting into more than one figure. If they were to discuss the PTM bound structures they could move those onto a separate figure.
-From 215 domains that the authors could express they could confidently determine a binding peptide for 163 and 74 had 5 or more peptides bound. Given the diversity of domains selected it could be that some of the domains are clearly peptide binding domains and some may bind peptides weakly at an interface that is most often binding another type of molecule. I wonder if this would be apparent when the drop-off rate is studied in aggregate. Is there an enrichment for domains that are also known to bind other types of molecules in those that did not bind any or many peptides ?
Reviewer #2: Teyra et al. performed phage display screening to identify peptide binders of a large number of protein recognition modules (PRMs). They screened a 16-mer amino acid library against 163 PRMs and validated many peptides from rounds 4 and 5 of screening using phage ELISA. For many PRMs they validated <5 peptide binders this way. For 74 PRMs with >5 validated peptide binders that showed residue preferences at specific sites, they used the hits to define PRM specificity profiles and build position weight matrices (PWMs). The authors also identified possible binding models for their hits by finding the PDB structure with closest sequence similarity between the domain and ligand pair with a hit from the phage screening. The PWMs were also used to evaluate the human proteome and identify high-scoring sequences. The data generated are freely available for download, and the authors built a web site for easy access. The authors state that clones have been deposited at AddGene (though I don't see them listed there yet). This large-scale study presents a tremendous investment of resources that is out of the reach of most academic-scale labs; the results should certainly be published and made widely available. The paper does not really provide any key new insights into PRM specificity. The results consist of reporting what was found, and summarizing the data using plots that compare the length, composition and specificity of the motifs defined here and those defined previously for SH3, SH2 and WW domains using a similar approach. A major emphasis is on comparing phage hits to examples of peptide-binding structures in the PDB, arguing that sequence similarities support binding of phage-derived peptide in the same mode (or at the same site, such that they would function as competitive inhibitors), though this is subjective claim without much support. To put this work in an appropriate context, these points should be addressed in the manuscript: -A major point is that screening for the tightest-binding (or most slowly dissociating) peptides for a PRM by phage display gives hits that differ from what that PRM binds in a biological context. The authors choose to emphasize similarities rather than differences between their hits and native ligands, but this is misleading and could easily lead to mis-use of the reported data. This is particularly true because the authors present matches to their phage-derived PWM as "Predicted Human Interactions" in their on-line database, which (I think) cannot be what they actually mean, in the absence of filtering to suggest that interaction in a cellular context is plausible. These screening data are what they are, and should be presented as such. If the authors want to argue that these data have value for identifying biological interaction motif instances, they need to argue that more convincingly rather than implying it without evidence.
-The most striking result from screening is the extremely high prevalence of Trp (20% of residues at "specific" positions). The authors suggest this is similar to what has been shown for human SLiM, but I think that is an overstatement. A realistic estimate of the extent to which Trp is enriched in this screen vs. in other determinations of PRM-binding peptides should be included in this paper.
-For many of these PRMs, a good deal is known about their specificities from prior work. For some domains, analysis of native partner interactions has provided a different impression of the specificity profile than the phage results. This needs to be acknowledged, with reference to ELM or other databases that compile what we know so far about PRM binding specificity data. The authors should present at least one example of a case where the phage results give a different picture than what was already reported. In this vein, it would be helpful if the authors could indicate for which domains there was vs. wasn't prior information about the PRM specificity. Ideally, comparisons to the disorderome would also include comparisons to the "SLiM-ome" (or the union or all reported SLiMs) as defined by ELM. Addressing these things would help clarify the novelty of the authors' findings.
-In many cases is not convincing that the phage-derived peptides derived from the screen are binding in the same mode as many of the structures the authors compare to. Figure 4 is presented as evidence, but similarities highlighted in this figure are subjective and are sometime not very high even by this loose standard (examples: ARIID4A, TDRD1, VCL, etc.). Calling out the similarities and downplaying the differences is misleading. Much more could be done to bolster confidence in a subset of the interactions (e.g. using comparisons to motifs already defined in the literature), though we acknowledge that this may be beyond the scope of this report.
-The experimental methods are described predominantly by referencing earlier work. It is difficult to be confident that these earlier papers provide a complete description of what was done, e.g. of the phage screening and ELISA protocols. This is particularly true because different papers are cited in the text vs. method sections (e.g. Teyra 2017 vs. Huang & Sidhu 2011) Were experimental variables such as concentrations, buffers, and wash stringencies exactly as in these earlier works? It would be good to associate this large data set unambiguously with the protocol(s) that generated it.
-The organization of the article could be improved. The text is redundant in places. Also, material suitable for the Introduction or Discussion appears in the main text, and aspects of the methods appear in the main text but not the methods section. Some of the supplementary materials have extra sheets in the excel file that don't appear intended for release. The authors should go through all supplementary tables and ensure that the terms used in the headings are precisely defined, with an equation if appropriate (all scores or ratios that are reported should be rigorously defined). The paper reads as if it were rather hastily written. This manuscript reports a large scale survey of peptide ligands by phage display with many different protein domains, together with a derived database where users can explore the data. Now that it is clear that a huge amount of cell regulation involves short linear motifs (SLiMS), medium and high throughput methods are essential to screen for candidate motifs that can be subjected to low throughput validation. Phage display is an essential tool for this endeavour. The resource prm-db built upon the experimental work presented here will be explored by (increasing numbers of) experimentalists seeking for motif interactors in their systems. It might also prove useful for annotated motif resources such as ELM to help in motif refinement, especially when only a couple of true instances are known. Overall, this is valuable work for the SLiM field.
One issue with phage display is that there can be a hydrophobic/Tryptophan bias in the recovered peptides. Importantly, this is addressed nicely in the discussion. There certainly are plenty of Wcontaining SLiMs such as DPW and WRPW and it is true that W is present more often than would be expected by chance. And many of the phage display motifs in the resource don't have W enrichment so then there is no issue. It is good, though, that researchers are properly informed.
I have no revisions to request. Major concerns: My single biggest concern is that there is really little in this work in terms of novel biological findings. Previous related works have tried to study, for example, the evolution of peptide binding interactions, or the extent by which the same domains may have more than one binding mode, or trying to make concrete predictions for in-vivo targets of peptide binding domains.
I understand that a lot of work has gone into obtaining the domain binding peptides but this manuscript would be considerably stronger if the authors then used the data for some application akin to those prior studies. To be constructive I provide here some suggestions of potential applications. However, I don't mean that the authors should do all of these things or even any of these things, just that I suggest the authors should showcase how their new data can be used to derive new biological findings. Suggested possible applications could include: 16th Oct 2020 1st Authors' Response to Reviewers 2 -One of the most striking aspects of this resource is that the authors have covered many domain families and have structural models for a very large fraction of them. This aspect is not really explored. Looking at the beautiful Figure 4, it seems clear that there is a large diversity of folds and no immediate patterns relating the fold and the motifs appears but this relation between fold, pocket and binding peptides could be quantified. Is there a relation between the properties of the fold and the types of sequences it binds? Any relation between the residues near the peptide domains and the amino-acids in the binding peptides? Are the domain residues in contact with target peptide residues more likely to be conserved? Are the domain-peptide specific residue-residue contacts for constrained peptide positions more likely to be important for binding? -In Figure 4, some of the specificity models for some domains (e.g. HSPA9, SND1, WDR74, NXF1, others) extend beyond the size of the structural ligands. Some of these extensions still contain positions that are apparently constrained. Can the authors use the structural models to rationalize these extended specificities? For some of these examples, the extended binding mode may already be described in the literature but any new examples could be interesting.
-Perhaps a low hanging fruit application would to suggest likely in-vivo protein binding partners and the binding sites for these domains. The authors have already started to do some of this work by predicting likely biding sites within disordered regions. These predictions could be overlaid on an up-to-date compilation of human interaction data and indicating potential binding site regions that could contribute to the protein-protein interactions. Providing some examples from this could help others understand how to make use of this resource. A larger extension of this could be to map and study human genetic variation on to these potential binding sites.

Response:
Thank you for the positive comments and constructive criticisms. Our priority has been to try to get our data out to the community as fast as possible so that many different groups can start to derive biological insight from these data and we submitted it as a Research Resource rather than an Article. However, we agree that it would be useful to delve more into the details of our data. Figure 4 is a panel of known PRM/peptide structures that are more similar to our PRM specificity profiles. Any effort on analyzing residue-residue interactions or to understand the functional role of extension in some of the specificity profiles would require a massive structural modelling and molecular dynamics. In addition, we believe the field is not accurate enough to obtain reliable structural models to perform analysis at atomic detail. In addition, any conclusion obtained from modelling would still require structural biology efforts for validation. We may pursue some of these studies in the future with collaborators, but most importantly, making this complete database available online will enable many other structural biologists to engage in studies of this kind, and importantly, researchers with existing expertise and interest in particular PRM families will be best equipped and motivated to use our database to enable studies of key We believe that these extensive additions of detailed analyses address the reviewers request for additional studies, and they position our phage-derived data in comparison with natural interactions in a manner that is most sensible, specifically, a comparison of our short peptides to short natural peptides and motifs in existing databases. We are encouraged that this analysis revealed good agreement between the phage-derived and natural ligands, along with some differences which are explicable by the different contexts of the datasets.

Minor concerns
-In Figure 4, the authors note that there are residues that are modified by phosphorylation and other PTMs. However, this is not really discussed in the results. I found it interesting that these positions provided clear mismatches between the phage-display selected amino-acids and the amino-acid in the structural ligand. For phosphorylation these are the expected S/T to phosphomimetic D/E mismatches but I am not familiar with the idea that tryptophan can mimic methylated residues. This is most apparent for the TUDOR domain of SND1 with a strong selection for tryptophan at the position selective for methylated arginine. Is this well known? It would be worth having a short section describing in more detail the results for the PTM binding domains, ideally looking in detail at the structural reasons for some of the differences. Even within the phosphorylation examples there are interesting differences that are worth going into more detail. For example, the two first 14-3-3 domains in Figure 4 have a phospho-threonine and phospho-serine in the structural peptide and they select different mimetic residues and the two phospho-tyrosine bound domains (DOK4 and SHC1) don't really appear to select for phospho-mimetic residues at those positions.

Response:
We agree with the reviewer that the specific cases of PRMs that naturally recognize PTMs are worthy of more detailed discussion. Consequently, an extensive new section entitled "Phagederived mimics of peptides containing PTMs" and a new Figure 5 have been included in the results to discuss the 17 cases where a structural peptide contains a PTM amino acid and the different mimetic residues observed in our data. This new section reads as follows: 5 "The 17 structure peptides with PTMs were divided into three groups of eight, four or five peptides containing phosphorylated serine/threonine (pSer/pThr, Fig 5A), phosphorylated tyrosine (pTyr, Fig 5B) or methylated Arg/Lys (meArg/meLys, Fig 5C), respectively. Six of the eight PRMs that recognized pSer/pThr belonged to the 14-3-3 domain family and the other two belonged to the CKS or NIF domain family. In five of these, the aligned phage-derived peptide contained a negatively-charged Asp/Glu residue in place of the pSer/pThr residue in the structure peptide, consistent with other reports that have shown that Asp/Glu can effectively mimic the shape and charge of pSer/pThr (Sundell et al, 2018). For two of the other PRMs, the 14-3-3 domains of YWHAE and YWHAZ, the phage-derived peptide contained an aromatic Tyr/Trp residue in place of pSer/pThr. The structure peptide for the remaining PRM, the NIF domain of CTDSP2, was unusual in that it contained two pSer residues and exhibited only minimal homology with the phage-derived peptide, thus making it unclear whether the phagederived peptide bound to the same site as the structure peptide. Three of the four PRMs that recognized pTyr belonged to the IRS domain family and the fourth belonged to the PID family, and in each case, the alignment showed that the phage-derived peptide contained a hydrophobic residue in place of the pTyr in the structure peptide. Finally, the five PRMs that recognized meArg/meLys included two TUDOR domains, a PHD domain, a TUDOR-knot domain, and a WD40 domain. Except for the WD40 domain, the alignments revealed that each phage-derived peptide substituted a hydrophobic residue for the meArg/meLys residue in the structure peptide. In the case of the WD40 domain, meLys in the structure peptide was substituted by His in the phage-derived peptide, but in this case, the structure peptide showed low similarity with the phage-derived peptide and specificity logo, making it uncertain whether the two peptides recognize the same site in the same manner. Taken together, these results showed that phage-derived peptides without PTMs can mimic peptide ligands that contain PTMs in many cases, either by using Asp/Glu residues that mimic pSer/pThr residues or by using hydrophobic residues that likely act as partial mimics of PTMs. Thus, our results could be useful for designing PTM mimics, but further biophysical and structural studies will be necessary to reveal the molecular basis for PTM mimicry." We believe that this aspect of our work is worthy of additional structural and functional studies which will likely be forthcoming soon. Again, release of our online database to the community will enable other researchers with expertise in these areas to pursue these studies also.
6 Comment: Figure 4 is beautiful but maybe too big for a paper. The authors could consider splitting into more than one figure. If they were to discuss the PTM bound structures they could move those onto a separate figure.

Response:
We agree, and following the reviewer's suggestion, we have moved the PTM-containing peptide structures to a new Figure 5 that includes cases with and without specificity profiles.

Comment:
From 215 domains that the authors could express they could confidently determine a binding peptide for 163 and 74 had 5 or more peptides bound. Given the diversity of domains selected it could be that some of the domains are clearly peptide binding domains and some may bind peptides weakly at an interface that is most often binding another type of molecule. I wonder if this would be apparent when the drop-off rate is studied in aggregate. Is there an enrichment for domains that are also known to bind other types of molecules in those that did not bind any or many peptides?

Response:
We agree with the reviewer that peptide-phage display may not have worked for some PRMs Thus, the answer to the reviewer's question is that there is no enrichment for domains that are also known to bind other types of molecules in those that did not bind any or many peptides.
We did not include this analysis in the paper because the results were as expected, and this analysis focuses on a minor unsuccessful aspect of our work (i.e. potential reasons for failed selections). However, we hope that by including it here and agreeing to the open access publication of the review response, that it will be available for those interested. binders that showed residue preferences at specific sites, they used the hits to define PRM specificity profiles and build position weight matrices (PWMs). The authors also identified possible binding models for their hits by finding the PDB structure with closest sequence similarity between the domain and ligand pair with a hit from the phage screening. The PWMs were also used to evaluate the human proteome and identify high-scoring sequences. The data generated are freely available for download, and the authors built a web site for easy access.
The authors state that clones have been deposited at AddGene (though I don't see them listed there yet). This large-scale study presents a tremendous investment of resources that is out of the reach of most academic-scale labs; the results should certainly be published and made widely available.

Response:
We thank the reviewer for the careful review of our work and the positive comments. We are grateful that the reviewer recognizes the value of our large-scale study and freely accessible online database and recognizes the need to publish and make the data widely available. With regards to the protein expression plasmids themselves, we also intend to make these all freely available as another valuable resource for the research community. AddGene laboratory is in the process of changing its physical location, and unfortunately, it is producing major delays in the processing time for all submitted samples. The plasmids were sent a year ago, they have already been stored in the AddGene repository, and the deposit agreement has been completed, as shown in the attached file provided by AddGene. The link provided in the paper will be the place to look for the information once the plasmids are fully processed by AddGene.

Comments:
The paper does not really provide any key new insights into PRM specificity. The results consist of reporting what was found, and summarizing the data using plots that compare the length, composition and specificity of the motifs defined here and those defined previously for SH3, SH2 and WW domains using a similar approach. A major emphasis is on comparing phage hits to examples of peptide-binding structures in the PDB, arguing that sequence similarities support binding of phage-derived peptide in the same mode (or at the same site, such that they would function as competitive inhibitors), though this is subjective claim without much support. To put this work in an appropriate context, these points should be addressed in the manuscript: A major point is that screening for the tightest-binding (or most slowly dissociating) peptides for a PRM by phage display gives hits that differ from what that PRM binds in a biological context.
The authors choose to emphasize similarities rather than differences between their hits and native ligands, but this is misleading and could easily lead to mis-use of the reported data.
This is particularly true because the authors present matches to their phage-derived PWM as "Predicted Human Interactions" in their on-line database, which (I think) cannot be what they actually mean, in the absence of filtering to suggest that interaction in a cellular context is plausible. These screening data are what they are, and should be presented as such. If the authors want to argue that these data have value for identifying biological interaction motif instances, they need to argue that more convincingly rather than implying it without evidence.

Response:
As the reviewer points out, we present protein matches to our phage-derived PWM as We would like to point out that we are fully aware that phage display gives optimal ligands and this is rarely exactly the same as biological ligands which are sub-optimal for affinity. This is now noted at several points in the manuscript, as follows: Last paragraph of the "General features of PRM specificity profiles": "Overall, phage-derived peptides reflect the hydrophobic characteristics of functional SLiMs at specific positions, which are critical for PRM recognition".

Last paragraph of the "Structural rationalization of PRM-ligand interactions": "these results
suggest that the phage-derived peptides for most of the PRMs in our database likely represent ligands that bind to functional sites identified previously in related PRM structures in the PDB.
Differences between optimal phage-derived peptides and natural structure peptides may also be of interest to understand the types of substitutions that can enhance the affinities of natural ligands and could thus be useful for inhibitor design. Consequently, our phage-derived peptides can provide molecular insights into natural protein function and can be used as inhibitors of natural protein-protein interactions." Two paragraphs of the Discussion: "Comparative analysis revealed that our phage-derived ligands for PRMs often resemble peptide ligands bound to similar PRMs in known PRM/peptide complex structures, suggesting that the natural and optimal peptides likely use similar molecular interactions to bind PRMs. Notably, differences between the optimal peptides in our database and the predominantly natural peptides in the structural database can provide valuable insights to better understand the structural basis of PRM/peptide recognition, which in turn could aid the design of peptide-based inhibitors to target PRMs in cells." "Phage-derived ligands rarely match natural ligands exactly, mostly because of differences between in vitro and natural evolutionary processes. Whereas in vitro evolution is driven to maximize affinity, natural evolution is driven by the need for high specificity to reduce crossreactivity with the thousands of non-partner proteins in the cell"

Comment:.
The most striking result from screening is the extremely high prevalence of Trp (20% of residues at "specific" positions). The authors suggest this is similar to what has been shown for human SLiM, but I think that is an overstatement. A realistic estimate of the extent to which Trp is enriched in this screen vs. in other determinations of PRM-binding peptides should be included in this paper.

Response:
In This section shows that the characteristics of phage-derived peptides partially reflect those from functional SLiMs, and the differences are also explained:  "A similar profile for hydrophobic residues was observed for specific positions in SLiMs, although the prevalence of Leu and Phe was 1.9-or 1.5-fold higher, respectively, compared with phage-derived ligands"  "The second most frequent amino acid at specific positions was Pro (11%), and Pro residues were also most abundant in specific positions of SLiMs (19%) and were highly prevalent in the disorderome (13%) (Fig 2C)."  "SLiMs also showed lower abundance of hydrophilic residues with a preference for charged residues in the specific positions, although the preference for negatively-charged residues over positively-charged residues was only observed in the non-specific positions (Fig 2C)."  "Moreover, our analysis showed that, overall, SLiMs are much more hydrophilic than phagederived peptides (mean RHI = -0.65 and 0.03, respectively) and slightly less hydrophilic than the disorderome (mean RHI = -0.84). However, specific positions of SLiMs are more hydrophobic than those of phage-derived ligands (mean RHI = 0.78 and 0.21, respectively),  Fig 1). This high frequency is even more striking, considering the very low abundance of Trp in the disorderome (0.4%) (Fig 2C). Ideally, comparisons to the disorderome would also include comparisons to the "SLiM-ome" (or the union or all reported SLiMs) as defined by ELM. Addressing these things would help clarify the novelty of the authors' findings.

Response:
We decided to consider the reviewer's suggestion to strengthen our manuscript by comparing our phage-derived peptides and specificity profiles to the functional SLiMs and the motifs from ELM database in a new section "Comparison of phage-derived ligands with functional ligands".

First, this section indicates which domains contain SLiM instances with PRM binding information
in ELM and select the high resolution cases for further analysis, as follows: "Of the 500 PRMs in our database and the 163 PRMs in this study, we found that only 44 or 20, respectively, had at least one SLiM ligand, showing low coverage of our PRMs in the ELM repository (Dataset EV6).
In order to compare the highest resolution examples of ELM and phage-derived results, we focused our analysis on PRMs for which our database contained enough phage-derived peptides to generate specificity profiles and for which the ELM class associated with the SLIM ligands did not contain any PTMs. For these eight out of 20 PRMs, we compared the ELM motif with the most similar phage-derived peptide and with the specificity profile, and we rationalized the comparisons using PRM structures with bound peptides (Fig 6)." Finally, we carry out a structure/function analysis summarized in Figure 6  shows strong conservation for a PPG sequence followed by an aromatic residue, and notably, a previous study with phage display and peptide arrays defined a very similar tetrapeptide specificity profile that was validated by proteomic experiments (Kofler et al, 2005).
Consequently, the short phage-derived specificity profile closely matches the core of the structure peptide and ELM motif, suggesting that this central region is most important for binding."

Comment:
In many cases is not convincing that the phage-derived peptides derived from the screen are binding in the same mode as many of the structures the authors compare to.  (Fig. 3B). We believe that 40% similarity should be a significant cut-off for similarity taking into account that only a subset of residues in the peptide are directly involved in PRM recognition, and that phage-derived ligands rarely match natural ligands exactly for the reasons pointed out in the Discussion section, as follows: "Whereas in vitro evolution is driven to maximize affinity, natural evolution is also driven by the need for high specificity to avoid interactions with the thousands of proteins that exist in a cell". In addition, we identified higher similarity between phage-derived and structure peptides in positions that aligned with specific positions in the logo than in the non-specific positions (73% vs 50%, respectively), and in positions that aligned with interacting and non-interacting positions (75% vs 29%, respectively).
Overall, we believe that these results generated from Figure 4 strongly suggest that the majority of phage-derived peptides binding in the functional binding site shown in the structure.
We agree with the reviewer that dissimilarities between phage and structure peptides are of great importance, but without the experimental biophysical or structural information of the phage peptides, we cannot asses the relevance of the amino acid substitutions at those positions. We included this point in the last paragraph of the "Structural rationalization of PRM-ligand interactions" section, as follows: "Differences between optimal phage-derived peptides and natural structure peptides may also be of interest to understand the types of substitutions that can enhance the affinities of natural ligands and could thus be useful for inhibitor design". Overall, we find good agreement between our database and the structure database. However, we again stress that we are submitting the entire database as a resource, so all of our analysis and conclusions will be available to the many researchers with interest in particular domains.
We believe that our database will inform and enable many detailed studies among other researchers, and that is the primary aim of our work.

Comment:
The

Response:
We appreciate the reviewer's concern for detailing the methods as accurately as possible.

Comment:
The organization of the article could be improved. The text is redundant in places. Also, material suitable for the Introduction or Discussion appears in the main text, and aspects of the methods appear in the main text but not the methods section.

Response:
We have tried to reduce redundancy in the paper. The resource prm-db built upon the experimental work presented here will be explored by (increasing numbers of) experimentalists seeking for motif interactors in their systems. It might also prove useful for annotated motif resources such as ELM to help in motif refinement, especially when only a couple of true instances are known. Overall, this is valuable work for the SLiM field.
One issue with phage display is that there can be a hydrophobic/Tryptophan bias in the recovered peptides. Importantly, this is addressed nicely in the discussion. There certainly are plenty of W-containing SLiMs such as DPW and WRPW and it is true that W is present more often than would be expected by chance. And many of the phage display motifs in the resource don't have W enrichment so then there is no issue. It is good, though, that researchers are properly informed.
I have no revisions to request.

Response:
We thank the reviewer for their comments and for acknowledging the quality of our work and its importance to the research community. With the revisions outlined above, we hope that we can now publish the manuscript and release the associated online database for use by the general research community. As noted by all three reviewers, the work will be of value to a broad range of researchers. As with any large-scale study, there are certainly many other analyses that can and should be done, and we look forward to seeing what use other researchers will make of our compiled database.
29th Oct 2020 2nd Editorial Decision Thank you for sending us your revised manuscript . We have now heard back from the reviewer who was asked to evaluat e your revised st udy. As you will see below, the reviewer is sat isfied wit h the modificat ions made and is support ive of publicat ion. As such, I am glad to inform you that we can soon accept your manuscript for publicat ion, pending some minor edit orial issues list ed below. The aut hors have sat isfact orily addressed the point s raised and also added new analyses that make the paper more int erest ing. This work is a valuable cont ribut ion as a research resource and I st rongly support publicat ion at this time.

3rd Nov 2020 2nd Authors' Response to Reviewers
The Authors have made the requested editorial changes.

Reporting Checklist For Life Sciences Articles (Rev. June 2017)
This checklist is used to ensure good reporting standards and to improve the reproducibility of published results. These guidelines are consistent with the Principles and Guidelines for Reporting Preclinical Research issued by the NIH in 2014. Please follow the journal's authorship guidelines in preparing your manuscript.

Journal Submitted to: Molecular Systems Biology
Corresponding Author Name: Sachdev Sidhu YOU MUST COMPLETE ALL CELLS WITH A PINK BACKGROUND ê B-Statistics and general methods the assay(s) and method(s) used to carry out the reported observations and measurements an explicit mention of the biological and chemical entity(ies) that are being measured. an explicit mention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner. a statement of how many times the experiment shown was independently replicated in the laboratory.
Any descriptions too long for the figure legend should be included in the methods section and/or with the source data.
In the pink boxes below, please ensure that the answers to the following questions are reported in the manuscript itself. Every question should be answered. If the question is not relevant to your research, please write NA (non applicable). We encourage you to include a specific subsection in the methods section for statistics, reagents, animal models and human subjects.

definitions of statistical methods and measures:
a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).

The data shown in figures should satisfy the following conditions:
Source Data should be included to report the data underlying graphs. Please follow the guidelines set out in the author ship guidelines on Data Presentation.
Please fill out these boxes ê (Do not worry if you cannot see all your text once you press return) a specification of the experimental system investigated (eg cell line, species name).

Data
the data were obtained and processed according to the field's best practice and are presented to reflect the results of the experiments in an accurate and unbiased manner. figure panels include only data points, measurements or observations that can be compared to each other in a scientifically meaningful way.