Structural diversity in de novo cyclic peptide ligands from genetically encoded library technologies

Cyclic peptides discovered by genetically encoded library technologies have emerged as a class of promising molecules in chemical biology and drug discovery. Here we review the cyclic peptides identified through these techniques reported in the period 2015 to 2019, with a particular focus on the three‐dimensional structures that peptides adopt when binding to their targets. A range of different structures have been revealed through co‐crystal structures, highlighting how versatile and adaptable these molecules are in binding to diverse protein targets, such as enzymes and receptors, or challenging shallow surfaces involved in protein‐protein interfaces. Analysis of the properties of the peptides reported shows some interesting trends, with further insight for those with structural information suggestive that larger peptides are more likely to adopt secondary structure. We highlight examples where co‐crystal structures have informed the key interactions that promote high affinity and selectivity of cyclic peptides against their targets, identified novel inhibitor binding sites, and provided new insights into the biology of their targets. The structure‐guided modifications have also aided the design of cyclic peptides with improved activity and physicochemical properties. These examples highlight the importance of crystallography in future cyclic peptide drug discovery initiatives.

There are many examples of bioactive CPs found in nature; the fungal natural product cyclosporine is used as an immunosuppressant while phalloidin and α-amanitin are highly toxic to humans. [4] Defined biological activities for these peptides have been elucidated, but rational de novo design of such molecules with desirable properties against targets of interest is highly challenging. It is thus not surprising, that the richest source of CPs (aside from those found naturally occurring) is through genetically encoded library technologies, where diverse starting pools of combinatorial peptides that cover vast chemical space can rapidly be generated and screened to identify CPs that efficiently bind to a target of interest.
This review will survey CPs that have been identified through genetically encoded peptide selection techniques, with particular focus on the CPs where three-dimensional structures in complex with their targets have been determined. We provide an update on previous reviews on de novo CP structures [5,6] focusing on the last 5 years ). [6] We will not cover peptides that are designed to be α-helical, such as stapled peptides, since, by definition, these will form an α-helix and thus have minimal variation in their structures. were not included unless either a selection process was used or the parent sequence was also first reported within the same period, that is, 2015 to 2019. Further articles were subsequently included, identified from works within articles returned by the search terms or serendipitous discovery.
Calculating Cyclic Peptide Properties: Peptide sequences from those listed in Table 1 were extracted from published articles and redrawn in ChemDraw (v19) and used as the input for DataWarrior [7] to calculate cLogP, the number of H-bond acceptors and donors, total surface area and polar surface area. A selection of natural product (like) peptide drugs used in this analysis were: cyclosporin, octreotide, pasireotide, linoclotide, romidepsin, and lanreotide.
Binding constants (K D ) were converted to pK D (−log(K D )) where data was available. The binding efficiency index (BEI) was calculated by dividing the pK D of a CP by its molecular weight in kDa. Graphs were generated using GraphPad Prism v8.
Data extracted from PDB files: To determine the interacting area of the CPs with their respective proteins from crystal structures, first the solvent-accessible surface area (SASA) was calculated for the CP: protein complex in PyMOL using the get_area command with dot_solvent set to 1 and all 'ignore' flags removed (using the command 'flag ignore, all, clear'). SASA was calculated for each of the interacting species separately that is, the CP and the protein, with the interacting surface area calculated using Equation (1). The CP and protein interface was assumed to be complementary.
Interacting area = SASA CP + SASA Prot: −SASA complex 2 ð1Þ The number of hydrogen bonds was counted manually from the find polar contacts function in PyMOL. The percentage of peptide containing secondary structure was determined by the flags associated to the atoms in the PyMOL object based on the following commands [8] :

| CYCLIC PEPTIDE SELECTION: GENETICALLY ENCODED LIBRARY TECHNOLOGIES
The three most common selection methods for discovering de novo CPs against protein targets of interest are: (a) phage display, [9] (b) mRNA display, [10] and (c) split-intein circular ligation of peptides and proteins (SICLOPPS). [11] In all cases, a diverse library of combinatorial peptides are ribosomally synthesised from a highly degenerate DNA/RNA template, post-translationally cyclised and screened against a target protein (or PPI). After rounds of selections, enriched peptide sequences are decoded via sequencing of their associated genetic tags. CP hits are then chemically synthesised, and their affinity for their target confirmed via biochemical and biophysical assays.
Here we provide a very brief overview of each CP selection methods (see [12] for a detailed review).
In phage display, a library of DNA sequences is ligated into a phagemid and transformed into bacteria, which produce phage displaying a single member of the library on its surface as fusion to a coat protein. [9] Peptides are (almost always) cyclised by cysteine disulfides [13] or chemical modification of cysteine residues with linkers.
The pool of phage is then incubated with an immobilised target protein, and binding phage are recovered and used as input for a further round of enrichment. Starting library sizes are typically around 10 8 sequences, a limitation imposed by the efficiency of transformation.
In mRNA display, a puromycin-linked mRNA library is translated in vitro to form mRNA-peptide fusions with covalent linkage of the puromycin to the nascent peptide. [10] The mixture is then applied to an immobilised target protein to recover binding sequences. The mRNA is reverse transcribed into DNA for PCR amplification. The recovered DNA is transcribed to yield an enriched mRNA library, which is further used as input for the next round of enrichment. As this method is in vitro and cell-free, there is no upper limit on the starting library size, with initial diversity >10 12 routinely used (with the only potential limitation being the number of unique molecules of library DNA that can be practically used). mRNA display was further advanced by codon reprogramming and incorporation of noncanonical amino acids (nCAA) into peptides. [14] This has been achieved through engineered aminoacyl-tRNA synthetases [15] or by using a set of ribozymes (termed flexizymes). [16] An integrated system using flexizymes coupled to mRNA-display is referred to as the Random nonstandard Peptide Integrated Discovery (RaPID) system, [17] which has been widely used to incorporate nCAA for peptide cyclisation, as well as for active-site targeting warheads, backbone modifications and exotic side-chain modifications. [18] In SICLOPPS, a DNA library is transformed into a host cell (either prokaryotic or eukaryotic) where the peptide is initially synthesised as part of a split-intein fusion protein, further processed when the intein fragment forms the active intein and released as a head-to-tail cyclised peptide. [19] In contrast to the other two techniques, the  Figure 1B,C), whilst the calculated lipophilicities of the peptides cover a wide range ( Figure 1D). It should also be noted that these parameters may not tell the whole story as CPs can have unexpected behaviours such as so-called chameleonic peptides [20] and perhaps the perceived drive to increase lipophilicity for better passive membrane diffusion must be balanced against lipohilic liabilities. [21] The calcu- . [22] Additionally, the affinities of the surveyed CPs (where data is reported) range from pK D 3.8 to 9.6 ( Figure 1E) and, whilst low affinities (pK D < 5) are observed for low molecular weight CPs (MW <1000), no correlation is obvious for higher MWs (1000-2500) and their affinities ( Figure 1F). The binding efficiency index (BEI, given as a binding affinity relative to MW) [23] is however relatively similar across the selection methods in this dataset, suggesting comparable binding efficiencies can be obtained using these methodologies ( Figure 1G). No clear trends were observed for cLogP as a function of molecular weight, with most clustered in the +5 to −10 region (and outliers at >2500 Da either FLAG-tagged [24] or modified with additional arginine residues [25] ; Figure 1J), and for cPSA and cLogP as a function of pK D ( Figure 1K,L).   [26] Amyloid-β peptide Aβ42 SICLOPPS AβC5-34, AβC5-116 AβC7-1, AβC7-14
those that have any) 53.8 ± 21.3% for helices and 41.8 ± 17.5% for sheets. As highlighted in the previous section, while peptides have been grouped by selection method, other factors may be responsible for the differences observed -for example library size, cyclisation method or target type.
Extracting information from their PDB files, overall, both the solvent accessible surface area (SASA) and the interaction area between CP and protein are very consistent across the two selection methods.
The average SASAs are 1611 ± 318 Å 2 and 1603 ± 310 Å 2 for phage and mRNA respectively, with corresponding interaction areas of 758 ± 91 Å 2 (49 ± 10% of CP SASA) and 750 ± 136 Å 2 (49 ± 13%) ( Figure 2A have pK D > 7, ( Figure 1E). This suggests high target affinity may aid CP:protein crystallization.   (Table 1), alongside exemplar peptide drugs, were calculated using DataWarrior. E, pK D values plotted for CPs derived from different selection methods, where binding affinity data was available. F, Plot of pK D and molecular weights. G, Binding efficiency index (BEI). H,I, cPSA, total number of HBD and HBA, cLogP plotted as a function of MW. K,L, cPSA and cLogP plotted as a function of known pK D . See section 2 for calculations and a list of natural-product peptide drugs used for data analysis. Open symbols indicate CPs where crystal structures in complex with proteins are available ( Table 2) for mRNA CPs it is 9 out of 11 (82%; 4 helix, 5 sheet). Additionally, when considering the relationship between some of the calculated properties relating to 'size' (total surface area and polar surface) and the observed SASA from the crystal structures, there is a strong linear relationship for the phage (R 2 = 0.94 and 0.89; Figure [42] in particular for CP libraries composed of longer peptides given the greater propensity for adopting secondary structure. Determining the structure of these CPs in solution would aid in this analysis.

| STRUCTURES OF CYCLIC PEPTIDES BOUND TO PROTEIN TARGETS
In the following section, we will focus on the insights gained from the CP:protein co-crystal structures published between 2015 and 2019.
Throughout this review, protein residues are referred to with 3-letter codes while CP residues are referred to with 1-letter codes.

| Cyclic peptides targeting enzymes
Enzymes represent the most common target class for CPs, and indeed over half of the targets reported in the last 5 years are enzymes ( Note: Top 10 PDB entries are for the period 2015 to 2019 while the bottom 9 are pre-2015. SASAsolvent accessible surface area. Areas, H-bonds and secondary structure elements were calculated from the PDB files as described in section 2. The '% Peptide area interacting' was calculated from the protein-interacting area and the total surface area of the peptide; for peptides interacting with more than one monomer in the structure (6) the larger of the interacting areas is shown on the assumption that the other interactions are induced by crystal packing. The only exception to this is hTNFα, for which the target is known to be a protein dimer so the total interacting area with both protein molecules is shown. The secondary structural elements have been calculated based on atoms flagged as such in the PDB file (see section 2). Hydrogen bonds were categorised as being from the CP to either the CP itself (intra), to water molecules (water) or to the protein (inter).
to be made with the protein of interest. With the larger size and greater number of interactions that CPs make with their target of interest, CPs typically can achieve higher potency and selectivity within the target enzyme family, relative to small molecules. CPs can interact with active-site residues and thus inactivate the catalytic activity of enzymes (e.g., HPA [41] ), or act as substrate competitive inhibitors, directly displacing the target substrate (e.g., KDM4A [37] ); in some cases they bind to allosteric sites to induce conformational change to inactivate the enzyme function (e.g., iPGM [30] ). Another type of ligand recently developed is the 'silent' allosteric CPs, where tight binding CPs that do not affect enzyme activity were used as capture probes for an enzyme-substrate complex (HIF1:PHD2 complex). [55] The recent availability of co-crystal  (Table 2). [41] Inspired by the selective inhibitor montbretin A, [74,75] which contains multiple phenolic groups, phenol-containing nCAAs ( L DOPA and resorcinol) were incorporated into the library in an attempt to bias the discovery of active site binding peptides. The D Y library led to enrichment of consensus sequence D YPYSCWxRH containing two tyrosines in a small five-residue macrocycle, with a fourresidue tail. piHA-Dm, a truncated peptide containing the consensus sequence (Ac-D YPYSCWVRH-NH 2 ), was shown to be a substrate competitive inhibitor (K i = 7 nM), with only modest improvements in potency F I G U R E 2 Properties of cyclic peptides co-crystallised with their target. Peptide interacting areas were extracted using PyMOL from their respective PDB files. Co-crystallised peptides generated using either mRNA display (blue circles) or phage display (red diamonds) show remarkable consistency in, A, the total peptide interacting area and B, the percentage of the total peptide area that interacts with the target protein, while, C, the molecular weights of CPs identified through phage-display are generally lower than peptides from mRNA-display. the natural amino acids, may have contributed to the initial lack of enrichment of L DOPA containing CPs. [43] The L Y initiated library showed little conservation amongst the enriched sequences, [41] but identified piHA-L5(d10Y) (Ac-YGHSHIRFGYSYHVSYCG-NH 2 ), a distinct sequence from piHA-Dm, [42] with K i = 14 nM for HPA (note that position 10 was L DOPA, abbreviated to 'd' in the original manuscript. The d10Y mutant was only 5-fold less potent and was used in the majority of subsequent experiments). In the crystal structure (PDB: 5VA9) the majority of the peptide (F8 to G18) forms an α-helix and the remaining residues adopt an extended loop conformation at one end of the helix ( Figure 3C,D).
In this structure, the catalytic residues Asp197 and Glu233 both form hydrogen bonds with the guanidinium group of R7, while Asp300 forms hydrogen bonds with the backbone nitrogens on F8 and G9 via a bridging water molecule. [42] piHA-Dm is half the size of piHA-L5(d10Y) (9 vs 18 residues) but is a more potent inhibitor. Both CPs contain helices which partially occupy the same space adjacent to the catalytic site of HPA, but the orientation of the helices differs between the two ( Figure 3E).

Substrate competitive CP inhibitors of histone lysine demethylase (KDM4A)
The histone lysine demethylases (KDMs) remove methyl groups from the sidechains of lysine residues in the N-terminal 'tails' of histone F I G U R E 3 X-ray crystal structures of HPA in complex with CPs. Structure of piHA-Dm (PDB: 5KEZ) (orange) bound to HPA (teal) with the catalytic residues Asp197, Glu233, and Asp300 highlighted in pink. A, CP as cartoon with the sidechains and shown as sticks and HPA as a surface; B, sticks with selected polar interactions shown as black dotted lines and water molecules as grey spheres. piHA-L5(d10Y) (5VA9) is shown in the same representations with the same colouring in, C,D. E, Is an overlay of piHA-Dm (purple) and piHA-L5(d10Y) (orange) shown as cartoons, highlighting the differences in size of the peptides and the differing orientations of the helices H3. [76,77] These methyl groups are part of a complex set of posttranslational modifications (PTMs) that are found on histone proteins to control eukaryotic gene expression. Mis-regulation of PTMs is often the hallmark of diseases such as cancer. In particular, the KDM4 subfamily which demethylates tri-/di-methyllysines (Kme3/ Kme2) on histone H3 at K9 and K36, have been identified as potential therapeutic targets for multiple cancers. [78,79] KDM4s belongs to a large family of Fe(II) and 2-oxoglutarate (2-OG) dependant oxygenases (>60 enzymes) with highly conserved active site architecture, making the development of potent and selective small molecule inhibitors challenging. [80] To identify novel and selective scaffolds for KDM4 inhibition, a RaPID mRNA-display selection was carried out against KDM4A. [37] Several hits containing an 'RSG' motif were identified, including CP2 (Ac-D YVYNTRSGWRWYTC-NH 2 ) which was found to have high potency (IC 50 < 50 nM) for KDM4A/ B/C and selectivity (>150-fold) over other KDM subfamilies and 2OG oxygenases. An X-ray crystal structure of KDM4A in complex with CP2 (PDB: 5LY1) revealed that CP2 inserts into the substrate binding pocket by adopting a distorted anti-parallel β-sheet conformation ( Figure 4A,B). R6 forms part of a type-I β-turn near the active site metal to occupy the same sub-pocket as the trimethylated lysine residue in histone substrates, and the guanidinium group forms hydrogen bonds with Tyr177, Ser288, Asn290. SAR analysis revealed that positive charge on R6 is crucial for its potency, and replacing R6 with Kme3 (CP2R6[Kme3]) or Rme2 converted them to substrates, despite sharing no sequence homology with histone substrates, and confirming novel arginine demethylase activity. [81] The cocrystal structure of KDM4A and CP2R6(Kme3) (PDB: 5LY2) confirmed its productive binding mode, orienting the Kme3 residue in the same manner to the H3K9me3 andH3K36me3 substrates ( Figure 4B). Structure guided design to improve the cellular stability and permeability of CP2 ( Figure 4C)  levels. [37] To identify CPs with improved cell permeability, a focused library selection was run with the 'RSG' motif centrally fixed and the remaining variable region randomised using NKN codons to bias the selection towards positively charged and hydrophobic amino acids. [38] The selection yielded CPs such as CP2f-3 ( L YIRIRRSGWLWC) and CP2f-7 ( D YTRFRRSGIVFYYC) with improved in vitro potency (IC 50 = 6 nM) ( Figure 4D). [38] CP2 was further investigated by deep mutational scanning (DMS), an approach where an mRNA-display library of CP2 derivatives with each residue varied was used for a single round of enrichment and enrichment factors for each peptide F I G U R E 4 CPs binding to KDM4A X-ray crystal structure of CP2 complexed with KDM4A. A,B, Crystal structure (PDB: 5LY1) of CP2 (orange) bound to KDM4A (teal) represented as secondary structure cartoon on the protein surface and B, overlay with H3K9me3 (pale green, PDB: 2OQ6) and H3K36me3 (purple, PDB: 2P5B) highlighting that R6 of CP2 occupies the same region of space near the active site: catalytic residues shown as pink sticks, metal(s) as coloured spheres. C, Structural representation of CP2 and D, CP2 derivatives: CP2.3 designed from structure guided modifications [37] ; CP2f-3 identified from focused library containing key 'RSG' motif [38] ; rCP2 identified through deep mutational scanning of CP2 [39] calculated. [39] At each position 40 different amino acids were used: 19 proteinogenic amino acids (excluding methionine) and 21 nonproteinogenic amino acids including N-methylated, aliphatic, aromatic and D-amino acids. A CP2 derivative, rCP2, was synthesised incorporating four identified mutations to reduce the polarity or steric bulk without reducing affinity: N4meA, T5A, W9Bzt and R10Nva ( Figure 4D). rCP2 retained the same affinity for KDM4A as CP2 (K D [SPR] = 7 nM).
The results for G8 were further investigated as substitution to any other residue except D A was detrimental to binding. Inspection of the ψ and φ angles for G8 in the co-crystal structure revealed that G8 is in a region disallowed for L-but permitted for D-amino acids so it is likely that substitution to an L-amino acid disrupts the β-turn and may pre-

CP targeting the substrate binding pocket of Urokinase-type plasminogen activator
Urokinase-type plasminogen activator (uPA) is a trypsin-like serine protease, and the main enzyme responsible for plasminogen activation in the extracellular space. [61] Serine proteases have highly conserved active sites and as such present challenging targets for the development of selective small molecule inhibitors. A disulfide-linked macrocyclic peptide mupain-1 (CPAYSRYLDC) was initially identified from a phage display selection (K D = 400 nM). [82] Recent optimisation work, using a selection based on back-flip library of peptide-protease fusions based on mupain-1, yielded IG1 and IG2 (differing only in the N-and C-termini which were amide/amide and amine/acid respectively). IG2 showed particularly high specificity and 100-fold higher affinity for uPA than mupain-1. [83] The co-crystal structure of IG2 (CPAYSRYIGC) with uPA (PDB: 6A8N) shows that the CP does not contain any specific secondary structural elements when bound to the protein, instead adopting an extended conformation covering a large patch of the protein surface making multiple interactions away from the active site binding pocket, furnishing the peptide with excellent selectivity ( Figure 5A). Interestingly, IG2 does not interact with any of the residues in the catalytic triad, Asp102-His57-Ser195, yet does form extensive contacts within the S1 substrate recognition pocketa highly conserved region within serine proteases. [61] R6, the only positively charged residue in IG2, is positioned at the P1 residue site of uPA, yet is not hydrolysed presumably as the amide bond is held in an orientation which does not allow nucleophilic attack by Ser195.
This is a common feature with many other CP inhibitors of uPA ( Figure 5B,C), which all have an arginine residue in the S1 pocket and occupy varying portions of the surrounding binding surface.

Allosteric CP inhibitors for isomerase: conformational lock for cofactor independent phosphoglycerate mutase
Cofactor-independent phosphoglycerate mutase (iPGM) is the sole enzyme responsible for the interconversion of the key metabolic attractive target for anthelmintics. [84] A RaPID mRNA display was carried out using iPGM from C. elegans, yielding peptides with small macrocycles and C-terminal 'tails' with very high affinity, in particular Ce-2 (Ac-D YDYPGDYCYLYGTCG-NH 2 ), with K D of 73 pM for its target. [30] The free cysteine thiol was shown to be important for activity (C14S mutant had 100-fold lower inhibition) and linearised peptides F I G U R E 5 X-ray crystal structure of uPA in complex with IG2. A, An extended conformation of IG2 (orange) across the surface of uPA (teal) with catalytic residues highlighted in pink PDB: 6A8N; B, overlay with other existing CP crystal structures of inhibitors bound to uPA (2NWN, [72] 3QN7, [71] 4GLY, [70] 6A8G [61] ) and C, in isolation, showing backbone cartoons and macrocycle linkages. All peptides contain an R residue (shown as sticks), which occupies the S1 pocket showed little/no inhibition. Peptides corresponding to both the macrocycle only (residues 1-8) and tail only (9)(10)(11)(12)(13)(14)  only D2 and Y3 have side chains that interact with iPGM through several water-mediated hydrogen bonds, and the rest of the amino acids form intramolecular hydrogen bonds, which fold Ce-2d into a precise 3D orientation to allow the backbone atoms to interact with the protein. Residue D6 likely acts to stabilise the macrocycle and helix as it interacts with D Y1, D2, Y3, C8, and Y9 ( Figure 6B); a comprehensive analysis of intermolecular interactions is detailed in Malde et al. [6] The authors note that the C-terminus of Y11 points towards protein bound Zn 2+ and Mn 2+ ions which would be within range for chelation by the thiol of C14 in the longer peptide(s), such as Ce-2. 30  In addition to enzymes as described above, CPs have been successfully used to target a wide range of proteins in the last 5 years, including receptors (HGFr, [35] IDOL, [45] and hEGFR [40] ), transcription factors (BCL6) [29] and small signalling proteins (hTNF-α [44] and HGF [34] ) (

TNFα: disassembly and inhibition
Tumour necrosis factor-alpha (TNFα) is a proinflammatory cytokine, critical for mediation of the normal inflammatory response but overproduction leads to tissue damage associated with various diseases, including rheumatoid arthritis, psoriasis and ankylosing spondylitis. [85] A bicyclic peptide phage display screen was carried out to identify binders of TNFα with an initial library of CX n CX m C (n,m = 2-6) cyclised through the cysteines by an aromatic core to generate bicyclic peptides. [44] Initial enrichment identified two similar peptides (both with a small CPPC motif in the first loop) and a subsequent affinity maturation procedure starting from the consensus sequence, returned M21 ACPPCLWQVLCG cyclised with a 2,4,6-Tris(bromomethyl)mesitylene (TBMB-methyl) core ( Figure 7A). This was shown to bind to TNFα with K D (FP) around 30 nM for a fluorescently labelled derivative. In cellular assays, a strong time-dependent effect was observed with much more potent inhibition occurring after prolonged incubation of M21 with TNFα, which was shown to be due to disassembly of the TNFα trimers into a mixture of dimers and monomers, with M21 binding to the dimers. The mode of inhibition was analysed using a combination of biophysical techniques (mass spectrometry, analytical ultracentrifugation and multi-angle light-scattering) and confirmed by co-crystal structure of M21:TNFα complex (PDB: 4TWT). M21 interacts with both TNFα monomers ( Figure 7B), one through a largely hydrophobic interaction involving the TBMB-methyl core and first loop (A1-C5) with interaction area 419 Å 2 , and the other through an α-helix (W7-G12) in the second, larger loop, which has a smaller interaction area of 238 Å 2 ( Figure 7C). In this selection, a range of cyclising 'cores' were used and M21 was shown to be inactive with alternative cores, demonstrating that while often considered to be almost irrelevant (on the grounds that it will be consistent between all the peptides in the pool), the cyclisation method can have profound effects on the CPs discovered and even be involved in binding to the target (a unique feature for this CP in comparison to the others covered here).
KRas signalling pathway: CP that selectively disrupts K-Ras(G12D) PPI V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (K-Ras) is a eukaryotic signalling protein involved in the pathways for cell growth, proliferation and differentiation. [86] K-Ras binds GTP in its active state and has weak GTPase activity, which can be increased by binding Mutations at Gly12 in K-Ras prevent efficient binding of GAPs thereby fixing K-Ras in the GTP-bound, active, form leading to aberrant cell growth. K-Ras Gly12 mutations are observed in many different cancers [87] and are well validated targets, with significant drug discovery effort invested over the last >30 years (see [88] for a recent review on RAS targeting peptides).
A phage display approach was used to target immobilised GDP-  Figure 8C). L7 and I9 binding sites are formed predominantly by residues in Switch II and are likely key regions, as an earlier alanine scan highlighted L7, I9, and D12 as key residues in KR-pep2d for inhibition. [63] D12 sits at the base of the α-helix and the carboxylate sidechain forms hydrogen bonds to the sidechain of Gln99 and Arg102.
KRpep-2 can allosterically block KRas(G12D) interaction with GNE, a key interaction that activates KRas; thus this novel allosteric site that selectively disrupts key PPI for downstream signalling could be targeted by future drug discovery efforts to gain selectivity for the G12D over the wild-type K-Ras (and other mutants, for example, G12C).

Inhibition of semaphorin signalling: Plexin B1
Semaphorins are a class of 20 proteins that act as signalling molecules in a range of processes typically constituting a short-range inhibitory signal. Plexins are a family of 9 transmembrane proteins that function as receptors for semaphorins with extracellular sema, plexin-semaphorin-integrin (PSI) and immunoglobulin-like-plexintranscription factor (IPT) domains and a GAP domain. [89] The Semaphorin 4D-plexin B1 (Sema4D-PlxnB1) interaction regulates osteoblast differentiation so disruption of this PPI would constitute a treatment for osteoporosis; however the large relatively featureless interface makes targeting with traditional small molecule approaches challenging. [90] An mRNA-display selection was carried out targeting part of the N-terminal extracellular region of PlxnB1 containing the Sema 4Dbinding region (hPlxnB1 SP residues 20-535). [54] After 6 rounds of selection, one sequence (and derivatives thereof) was highly  [91] There are also interactions with blade 4 of another molecule of hPlxnB1 within the crystal structure, although this is likely induced by crystal packing rather than a dimeric interaction. Thus, it is important to consider how this affects the peptide conformation in crystallo as it may not be representative of the structure in solution.
The maximal level of disruption of the hPlxnB1-Sema4D interaction achievable was 65% with PB1m6, which was suggested to be due to its lower affinity for full-length hPlxnB1 over hPlxnB1 SP . Dimeric versions of PB1m6 were synthesised with two copies of the CP attached via linker regions of various length to gain avidity through multi-valency. All the linkers tested led to molecules with at least 80-fold tighter binding than PB1m6 as measured by SPR (K D = 30 PM), and near total inhibition of Sema4D-mediated cell collapse. [53] This exemplifies a conceptually simple but effective bivalent approach, analogous to antibodies, to enhance the binding of CPs against their targets of interest.

| DISCUSSION AND FUTURE PERSPECTIVE
Since the first CP phage display was described over 25 years ago, [13] the CP genetically encoded library technologies have transformed the way ligands are generated for protein targets of interest. The plex systems can also be targeted by CPs (e.g., lipids, cells), [92][93][94] demonstrating the power and versatility of CP genetically encoded library technologies.
The structural information that has become available for de novo CPs over the past decade provides significant insight into their unique properties. Surveying the co-crystal structures of de novo CPs published in the last 5 years alone, it is apparent how adaptable CPs can bind to a variety of protein topographies, from shallow surface grooves to deep enzyme active site pockets. Secondary structure motifs, such as α-helices, β-sheets, 3 10 -helices, and β-turns, allow the CPs to adopt compact, globular structures binding through precisely oriented backbone hydrogen bonds or more extended backbone conformations, scaffolding sidechains to make key non-polar and F I G U R E 9 Structure of PB1m6 bound to hPlxnB1. A, Two molecules of CP PB1m6 (labelled I (orange) and II (purple)) bind at the dimerisation interface of two hPlxnB1 proteins (labelled I (teal) and II (light green)). The Sema4D binding interface is highlighted on both protomers in pink. B, Cartoon representation of PB1m6 I on the surfaces of hPlxnB1 I and II. C, Peptide only in an alternative orientation, D. Stick representation of hPlxnB1 I with the fifth and sixth blades of the 7-bladed β propeller highlighted in purple and green (hPlxnB1 II is omitted for clarity). Protein residues making hydrogen bonds are shown as sticks with hydrogen bonds as black dashed lines. PDB:5B4W electrostatic interactions. We have surveyed calculated physicochemical properties and affinities between CPs identified through different selection methods in the last 5 years (section 3). While some trends were observed within this limited dataset, deconvolution from other factors that may be responsible for the differences is challenging. As discussed in section 3, both phage-and mRNA-display rely on binding, enriching for higher affinity binders by through retention of species with slow k off rates (N.B. K D = k off /k on ), whereas the SICLOPPS technique, for example, employs a functional read-out that does not explicitly enrich for high-affinity target binding. One confounding factor is the library size (i.e., length of peptides), which is typically not the same across different methods and we are not aware of any examples using the same library design with different selection techniques.
Cyclisation methods are also different; typically phage selections employ sidechain-to-sidechain cyclisation (commonly Cys-Cys through a disulfide or via a linker/core), mRNA head-to-sidechain and SICLOPPS head-to-tail. Another factor is target type, which is entirely independent of the selection method; conceptually, at least, any technique could be applied to any target. In addition, the calculated physicochemical properties are based on algorithms designed for small molecules and likely have larger errors for larger peptides, where the intra-molecular H-bonding or secondary structures arise; thus, experimental validation is needed before conclusions are drawn. Rational design of these peptides is beyond what is achievable from the knowledge we have currently, yet using selection procedures from massively diverse pools, peptides with high affinity and selectivity can be generated in a relatively short space of time making these selection techniques very powerful tools.
The CP co-crystal structures are also invaluable to drug discovery efforts. They can reveal new ways of targeting proteins, such as different modes of active site or allosteric site inhibition (e.g., even for well-established targets such as K-Ras) [64] and identify new protein hot-spots, or even inform on substrates that may lead to new biological insight. Combined with activity-based SAR, co-crystal structures can also aid in medicinal chemistry design to further refine CPs for biological/therapeutic applications, or to inspire peptidomimetic/small molecule design. One of the major hurdles for CPs is reliable cell permeability, a challenge faced when targeting intracellular proteins; as demonstrated for KDM4A, [37] crystal structure-guided modifications can support the design and engineering of cell-permeable and stability of CPs. While we have not covered in this review, CPs can also act as co-crystallization chaperones [68,95] and stabilise protein conformation, thus providing new structural understanding of the target protein.
However, in some cases, in particular when CPs are found at the interface of proteins, it is not trivial to distinguish between biologically-relevant interfaces and non-specific interfaces due to crystallographic packing. It is therefore important to consider crystallographic data in combination with protein:CP interaction data in solution (e.g., NMR, biochemical assays).
The reporting period has seen a substantial increase in publications on de novo CPs, several significant advances in CP technologies, [12,18] four companies founded based on CP drug discovery, and several CPs entering clinical trials. [12,18] CPs are exciting and attractive modality with enormous potential, not just in therapeutic applications, but also in other applications such as imaging, diagnostics as well as in chemical biology and basic science, and we anticipate significant growth and impact in these areas.