Large scale ab initio modeling of structurally uncharacterized antimicrobial peptides reveals known and novel folds

Antimicrobial resistance within a wide range of infectious agents is a severe and growing public health threat. Antimicrobial peptides (AMPs) are among the leading alternatives to current antibiotics, exhibiting broad spectrum activity. Their activity is determined by numerous properties such as cationic charge, amphipathicity, size, and amino acid composition. Currently, only around 10% of known AMP sequences have experimentally solved structures. To improve our understanding of the AMP structural universe we have carried out large scale ab initio 3D modeling of structurally uncharacterized AMPs that revealed similarities between predicted folds of the modeled sequences and structures of characterized AMPs. Two of the peptides whose models matched known folds are Lebocin Peptide 1A (LP1A) and Odorranain M, predicted to form β‐hairpins but, interestingly, to lack the intramolecular disulfide bonds, cation‐π or aromatic interactions that generally stabilize such AMP structures. Other examples include Ponericin Q42, Latarcin 4a, Kassinatuerin 1, Ceratotoxin D, and CPF‐B1 peptide, which have α‐helical folds, as well as mixed αβ folds of human Histatin 2 peptide and Garvicin A which are, to the best of our knowledge, the first linear αββ fold AMPs lacking intramolecular disulfide bonds. In addition to fold matches to experimentally derived structures, unique folds were also obtained, namely for Microcin M and Ipomicin. These results help in understanding the range of protein scaffolds that naturally bear antimicrobial activity and may facilitate protein design efforts towards better AMPs.

important in targeting bacterial membranes is amino acid composition.
Trp residues are frequently found in AMPs and multiple studies have highlighted their importance in interactions with biological membranes.
Peptides containing only Arg and Trp residues can be highly antimicrobial. 7 Trp residues are critical for anchoring and insertion of peptides into the membrane 8,9 and their removal can have drastic effects on the antimicrobial activity of peptides. 10 Simulations have been used extensively to probe these interactions. 11 Trp is stabilized by hydrogen bond interactions with water molecules and headgroups at the interface. 12,13 However, the Trp residues can equally easily lie inside the membranes where their bulky sidechains can disrupt the packed lipid chains. 8 Similar behavior is also seen for Tyr and to some extent for Phe side chains. 14,15 It is common to see the insertion of Trp residues in efforts to design AMPs. 16,17 AMPs can contain secondary structures of all kinds-helices, b-sheets, extended, and loop regions. Generally, AMPs can be divided in 4 structural groups: a, b, ab, and non-ab. 18 The most abundant structural group of AMPs are amphipathic a-helices, followed by ab and all-b structures. 19,20 Aside from short, linear a-helical peptides, more complex all-a folds have also been found. These include helix hairpins and helical bundles, commonly found in class II bacteriocins such as the well-known food preservative nisin. AMPs with ab structure often have disulfide bonds, such as those seen in plant defensins' cysteine-stabilized ab (CSab) motif. All-b AMPs have structures comprised of multiple b-strands, for instance a simple b-hairpin stabilized by a circular backbone and disulfide bonds (as seen in u-defensins of non-human primates) or the cysteine-stabilized triple-stranded b-sheet seen in human defensins. Unlike conventional antibiotics, which generally target metabolic enzymes, AMPs act mainly by membranetargeting mechanisms and are selective due to the difference in charge of prokaryotic and eukaryotic cell membranes. Furthermore, AMPs have faster antimicrobial activity than conventional antibiotics. 21,22 Generally speaking, AMPs can be divided into two mechanistic classes: membrane disruptive and non-membrane disruptive (acting on intracellular targets). Disruption of the negatively charged prokaryotic membrane is the predominant mode of action of AMPs, with three main mechanisms proposed: the barrel stave, toroidal, and carpet model. 23,24 AMPs have therapeutic potential as bioactive coatings for needles, catheters, implants, surgical tools, bandages, and even contact lenses.
However, only a few have been approved for clinical use, and only for topical application, mainly due to their toxic properties. 25,26 The main difficulty in AMP drug development is our lack of understanding of modes of action. 27 The availability of structural information is crucial in facilitating AMP design efforts to predict, understand and implement knowledge-based enhancement of activities yet the pace of structural determination lags far behind AMP discovery: currently, there are over 2000 AMP sequences known, but only about 10% of them are structurally characterized. Researchers have used different methods in order to optimize antimicrobial activity on known protein scaffolds. Quantitative structure-activity relationship (QSAR), Regression models and Machine Learning approaches such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forests (RF) and Hidden Markov Models (HMM) are some of the approaches employed. 28,29 However, most of these studies are sequence-based, and design efforts based on the structural properties of more complex folds, such as studies on b-hairpins by Edwards et al. 30 or Yang et al., 31 are less common.
Notably, sequence information on its own is not sufficient to determine relevant properties of folds, such as the amphipathicity or dipole moment of the molecule.
Due to coevolution with pathogens, AMP sequences are exceptionally diverse. 32 AMP genes have been found to evolve rapidly in both vertebrates and invertebrates as a result of rapid gene duplication, diversification, and positive selection. This has been documented for mammal, bird, amphibian, and insect AMPs. Positive selection in AMPs seems to be highest immediately after gene duplication, although there could be a limit on observing high number of nonsynonymous substitutions in distantly related sequences. 33 It is known that AMP and immune genes evolve much faster than non-immune genes 34,35 with other work showing that AMPs can evolve 3 times faster than other proteins. 36 Due to this rapid evolution, reconstruction of the evolutionary history of AMPs can be a challenging task. 37 Moreover, this limits the possibilities and scope of homology modeling that can be performed for known sequences: not only is the number of available templates limited (see above) but evolutionary relationships between targets and templates are often hard to discern. In such cases, where no experimentally solved homologous structures exist, or exist but cannot be identified, models have to be constructed from scratch by performing ab initio modeling. Successful ab initio modeling of proteins without structurally characterized homologs with RMSD values around 2-5 Å has been reported for sequences shorter than 100-120 residues, [38][39][40] with the most recent CASP free modeling experiments showing this limitation to be at 150 residues. 41,42 AMPs are generally small in size which makes them particularly suitable for ab initio modeling. Furthermore, they often contain disulfide bonds which, if their connectivity can be predicted, provide valuable additional data to guide modeling. Most often, cysteines in extracellular proteins come in even numbers. 43 In an overview of disulfide-containing AMPs, Lehrer 44 discusses peptides with intermolecular as well as intramolecular disulfide bonds. AMPs with one cysteine are quite rare and have been found to form hetero-or homodimers. While redox status is known to have an effect on antimicrobial activity, 45,46 Lehrer's overview 44 does not give examples of AMPs containing two reduced cysteines. Interestingly, when its cysteines are reduced by the host, human antimicrobial peptide b-defensin 1 shows increased activity. 46 Here, we carry out a large scale ab initio modeling of the structures of structurally uncharacterized AMPs with Rosetta 47 aiming to improve our understanding of the AMP structural universe. Although it is wellknown that membrane-active AMPs can undergo structural changes when adopting a functional conformation at the membrane, 48 prediction of structures in aqueous solution can be expected to illuminate non-obvious evolutionary relationships and shed light on structural determinants of initial membrane interaction. We assembled a protocol to create a representative set of AMP sequences which have no predicted homologs of known 3D structure, and predicted disulfide bonds in order to facilitate their modeling. Following ab initio modeling, we tested their stability, compared their 3D structures against characterized AMPs and found fold matches as well as several unique folds.  Sequence assembling and processing   Sequences longer than 20 and shorter than 120 amino acids were collected from UniProt 49 and APD2 19 on 17 March 2015. APD2 was chosen from the several AMP databases available since it is manually curated, and comparatively large and up-to-date. The UniProt release at the time was 2015_03. Sequence redundancy was reduced to a threshold of 45% using CD-HIT 50 and its global alignment option.
HHpred 51 was used to detect sequences with structurally characterized homologs in the PDB70 database as at 6 September 2014. PDB70 is a version of PDB that is redundancy-reduced to 70% sequence identity. 52 Upon inspection of the results, three conditions were required to be satisfied before a given AMP would be considered to have a homolog: (1) HHpred fold match probability higher than 90%, (2) alignment coverage of query sequence higher than 40%, and (3) absence of any mismatch greater than 2-fold in length of query and hit. Sequences with >35% residues with a IUpred 53 score of 0.5 or above were considered to be intrinsically disordered AMPs. Since ab initio modeling is not suitable for these proteins, they were not considered further.

| Disulfide prediction, ab initio modeling, ab initio benchmarking and clustering
Disulfide bond predictions were made using Disulfind, Dianna, and Dinosolve. [54][55][56] Disulfind and Dianna were run at their respective servers while Dinosolve was run locally (and used the nr database 57 44 For those AMPs with 3 or 4 cysteines, we ran Rosetta 47 modeling with all possible combinations. AMPs with 5 and more cysteines were run with any consensus and without disulfide constraints. It should be noted, however, that although intermolecular disulfide bonds in AMPs are considered to be rare, 59 peptides with odd number of cysteines, could form dimers or oligomerize, such as recently seen in rodent a-defensinrelated AMPs. 60 Ab initio modeling of 184 AMPs was performed with Rosetta software using the fix_disulf and relax flags. Use of the nohoms flag to exclude homologous fragments was unnecessary since modeling was only done for targets which HHpred bore no obvious homology with PDB entries. Where an AMP target contained a modified aminoacid, it was modeled using the natural, unmodified version. Where the predicted structure for such an AMP proved to be of interest, the likely consequences of the unmodeled modification were considered later. For each AMP, 1000 models were made. Models defined as successful by Rosetta (meaning they passed the filters that eliminate models with non-protein like features) were clustered into 10 groups using SPICKER v. 2.0 61 in order to identify the likely near-native structure for each AMP. Larger and more homogeneous clusters indicate more reliable fold candidates. We performed Rosetta benchmarking on AMPs of known structure (fold matches of our modeled AMPs, see Tables S1-S3 and Figures S1-S3) to detect a threshold value for cluster size to refer to when inspecting our models. For the benchmarking, the nohoms flag was used when running Rosetta in order to exclude fragments from the target and homologous structures. We chose to include modeled structures where the largest cluster size was at least 25% of the total number of successfully modeled structures. However, structures with lower percentages were scrutinized and considered if the centroids of three largest clusters had similar folds ( Figures S4-S7).

| Fold matching and visual representation of matches
We assembled a database of experimentally solved AMP structures to compare models against. To ensure that we collected as many struc- The results of this run were additionally filtered: only one PDB structure was taken for each UniProt entry, prioritized such that: (1) an X-ray structure was chosen if possible, (2) the X-ray structure with the highest resolution and matching start and end positions, compared to the AMP, was used, (3) if an X-ray structure was not available, the first NMR structure to match the start and end residue was chosen, and (4) the first chain was chosen. Structures with non-matching start and end residues were omitted if the mismatch was greater than 5 amino acids.
For each modeled AMP, a structure similarity search was carried out with GESAMT 62 to compare the three largest clusters' centroid models against the local database of AMP experimental structures. The results were then filtered so that only those modeled structures meeting all of the following conditions were left: (1) Q-score (a measure of structural similarity ranging from 0 to 1) 0.3, (2) query (AMP model) sequence length < 1.5 3 match (experimental structure) sequence length, that is, query can't be more than 50% larger than match, (3) match sequence length <1.5 3 model sequence length, that is, match cannot be more than 50% larger than query, and (4) number of aligned residues 0.7 3 query sequence length, that is, the alignment covers at least 70% of the query. Additionally, after filtering out results, fold matches were inspected manually: a model was considered to have a fold match if the number and type of secondary structure elements was similar. Ab initio models with tertiary structure matching at least one of the top 3 filtered fold matches were considered further.
In order to visualize the structural similarity, GESAMT was run in all-vs-all fashion on a set of AMPs from the PDB90 and matching modeled structures. The resulting similarity matrix was used as input for CLANS software 63 to cluster the structures. The CLANS software was used to visualize clusters of modeled structures of AMPs and the matching folds in the PDB, 52  Since GESAMT employs a topology-dependent algorithm we additionally used the topology-independent superposition method CLICK to search AMP models not matching by GESAMT against the same local database of experimentally determined AMP structures. 64 Again the results were filtered so that the query couldn't be more than 50% larger than the match, and the match could not be more than 50% larger than query. For models where no matches with AMPs were found, we ran additional CLICK database search on all protein chains from the PDB90 (not just AMP structures) and filtered results again in a similar manner, after which matches with Z-score values higher than 3 were taken forward. In cases where all of the Z-score values were lower than this threshold, we lowered this value to 2. Finally, all of the matches that were left after the filtering were visually inspected.

| Stability of peptides
Molecular Dynamics (MD) simulations were performed using the AMBER package and AMBER FF14SB force field. 65,66 Simulations with explicit solvent were performed using TIP3P water molecules with a 12 Å buffer between peptide atoms and the edge of a rectangular box.
For each simulation, 10 000 steps of minimization were performed, with the first 5 000 using the steepest descent algorithm followed by 5 000 steps of conjugate gradient. The system was heated to 300 K in two steps; first heating from 0 to 100 K for 5 ps followed by heating from 100 to 300 K for another 100 ps, both using the Langevin thermostat. In the production step, we simulated the system at 300 K and 1 atm using the Berendsen barostat for 100 ns. Simulations with implicit solvent were run for 1 ms. All simulations were run in triplicate.
For the last 50 ns of each simulation, structural alignment was performed on Ca atoms of residues that formed regular secondary structure in the Rosetta model. RMSD clustering was carried out using MMTSB Tool Set based on those Ca atoms. 67 The structure closest to the centroid model was taken as a representative for each highly populated cluster.

| RE SUL TS A ND D I SCUSSION
In order to select and process AMPs, a workflow was implemented ( Figure 1) to collect a non-redundant set of AMP sequences, eliminating those whose fold could be reliably inferred by homology detection and those predicted to be largely intrinsically disordered. Ab initio modeling of the resulting set, with or without predicted disulfide bonding as an additional constraint, was carried out using Rosetta. Clustering of the results determined candidate fold predictions, which were then compared to known AMP structures. Due to evolutionary constraints, protein folds can remain conserved even when there is an apparent lack of homology. 68 Similarity between our models and known AMP structures could therefore result from distant, unsuspected homology.
Similar folds can also arise as a product of convergent evolution; the best example among AMPs are defensins, which are taxonomically widespread over insects, mammals and plants, 69,70 and are found to adopt a variety of folds such as b-sheets (triple-stranded b-sheet of Human Neutrophil a-defensin HNP-3 is an example), cyclic backbone After collecting AMPs from APD2 19 and UniProt, 49 elimination of identical sequences resulted in an initial count of 2131. CD-HIT 50 was used to further reduce this number and create a representative set with shared pairwise sequence identity of no more than 45%. 585 AMPs were then analyzed with HHpred 51 in order to discard AMPs with possible structural homologs, which left us with 235 peptides (see Methods for criteria). Furthermore, intrinsically disordered sequences were discarded and disulfide bridges were predicted in those peptides containing three or more cysteines. We proceeded with Rosetta ab initio modeling for 184 peptides, after which clustering of successful models was performed. Finally, a GESAMT 62 database structural similarity search was run on centroid models of three largest clusters for each AMP in order to obtain fold matches with experimental structures of AMPs from the PDB database. 52 Psipred secondary structure prediction for 184 AMPs predicted 50.54% of peptides to have all-a structure, 33.7% to have mixed a and b, 15.22% to have all-b structure, and 0.54% to adopt a coil conformation. 71 Our set of 184 sequences contains some peptides with more than one pair of cysteines. For this subset, modeling was undertaken with different combinations of disulfides (see Methods), so that the overall number of modeling attempts grew from 184 to 216.
The reliability of our models was assessed based on the size of the largest cluster, and using the results of a benchmarking exercise in which structurally characterized AMPs were modeled ab initio (see Tables S1-S3 and Figures S1-S3). This revealed that, even without the use of disulfide contact information, the largest cluster usually predicted the fold correctly. Furthermore, the isolated failure (1CZ6) was distinguished by smaller largest cluster size. Where the largest cluster size exceeded 25% of the total models, that cluster always had the correct fold and so this was the major criterion used to judge the modeling results. Among 216 modeling runs, 48% of peptides had largest cluster containing less than 25% of the total number of successful models, 20% of peptides between 25 and 39% of the total, 10% of peptides between 40 and 49% and 22% had largest cluster containing between 50 and 100% of the total number of successful models ( Figure 2). All-a structures were predominant in the most reliably modeled categories ( Figure 2). This meant that, initially, 52% of structures were taken as reliable. Where the largest cluster was not larger than 25%, a comparison of the top, second and third largest cluster centroids was carried out. In two cases where these matched visually, the prediction was also considered of interest.

| Visual representation of fold matches
Modeled structures that matched AMP folds in the PDB were clustered using CLANS 63 in a semi-automated manner along with the corresponding fold matches (31 models and 25 fold matches making a total of 56 structures). Several modes of clustering were trialled but none proved capable of results that were fully in accord with expert assignment based on visual examination. For example, proteins with mixed ab topologies sometimes allied more closely with b-hairpins, through a good fit of that portion, rather than with proteins with the same ab overall topology but more poorly matching b-structure. Therefore, some manual (re)assignments were made to fine-tune an initial clustering of 8 groups for presentation purposes ( Figure S8) and for discussion below. Three clusters contained b-hairpins and were combined and respectively. In Figure S8, structures with greater similarity (higher Qscores) are positioned at shorter separations. We next discuss the results in each fold family.

| Fold matches
All the fold matches shown here had a Q-score 0.3 and were additionally manually screened so that matches were considered only when the AMP model and the experimentally determined matching structure were not too dissimilar in length and aligned over a majority of the model structure (see Methods).  Bacteriocins are antimicrobial peptides synthesized by the ribosomes of a variety of bacteria (both Gram-positive and Gram-negative). 79 Cotter et al. 80 categorized bacteriocins into three classes: class I, also known as lantibiotics, are post-translationally modified peptides containing amino acids called lanthionines; class II are a heterogeneous group of small heat-stable non-lanthionine containing peptides which may have disulfide/thioether bonds; and class III are large, heat-labile, lytic proteins called bacteriolysins. 18 Lacticin Q belongs to class II and shows selectivity for Gram-positive bacteria at the strain level suggesting that membrane lipid composition might not be the only determinant of its antimicrobial activity. It is also known that the peptide causes accumulation of hydroxyl radicals. 81 It has been suggested that circular bacteriocins share a common overall structural motif of a saposin fold, that is, four helices surrounding a hydrophobic core, regardless of low shared sequence identity 82 and our results are consistent with this ( Figure 3). Tryptophan residues are known to be involved in protein folding as well as to have a tendency for burial at the bilayer interface. 83,84 Another common feature of circular and leaderless bacteriocins is the presence of solvent-exposed tyrosine or tryptophan residues that are likely to facilitate membrane penetration. 85  bacteriocin, nisin, for its nanomolar range antimicrobial activity, pore size and ATP efflux. 75,76,81,86 However, compared to nisin, lacticin Q is a leaderless bacteriocin-the peptide is synthesized without the Nterminal leader sequence that is otherwise removed when exporting from cells. 76 76,77 While this manuscript was in preparation, an NMR study of the Lacticin Q structure was published by Acedo et al. 87 The RMSD of the Ca atoms between our model and the NMR structure is 2.69 Å, while RMSD values of the Ca atoms between structures obtained with MD and the NMR structure range from 2.97 to 4.57 Å ( Figure S9). The NMR model experimentally validates our model.

| b-hairpins
Three confidently modeled AMPs, Lebocin Peptide 1A, 88 Odorranain M1 72 and Silkworm 001 (unpublished; APD identifier 01974), were predicted to fold as b-hairpins ( Table 1). The three models resulted in large clusters of 50%, 29%, and 38% of the successfully modeled structures, respectively, and their reliability was further tested by running 100 ns simulations in explicit solvent and performing clustering as described in Methods section. Structure representatives of highly populated clusters superimposed in Figure S10 show the stability of modeled folds in solvent.
With the reliability of our models confirmed, we compared their properties with those of the fold matches (Table 1) Among the PDB fold matches to these 3 peptides, we find three u-defensins: BTD-2, RTD-1, and HTD2 (Retrocyclin-2). 90 These are backbone cyclic b-hairpin AMPs containing three parallel disulfide bonds also known as the cystine ladder motif. A study on 18 residue long BTD-2 u-defensin analogues by Conibear et al. 90 showed that a cyclic backbone appears to be essential for membrane activity resulting in antibacterial effects, as was also reported earlier by Tang et al. 91 for RTD-1 u-defensin. However, the disulfide bonds have been shown to be essential for stability of these AMPs, as well as for resistance to the action of proteolytic enzymes. 90,92 Disulfides can be either essential and dispensable for the activity of b-hairpins: Protegrin-1 was as active in linear form as in cyclic, 93,94 while reducing disulfides in Arenicin-1 led to decreased activity. 95 Interestingly, it has been shown by Ma et al. 96 that disulfides were not only dispensable for Thanatin activity and toxicity, but that the secondary structure was maintained in their absence as well.  Identity percentages were obtained through GESAMT. C1 size %-size of the largest cluster compared to the overall number of models. a Data unpublished.
of Cys residues with Tyr, Phe, or Ala maintain the fold as a result of aromatic stacking, 100 but our peptides are not spatially constrained by aromatic interactions either.
In order to assess how unusual the absence of disulfides in an AMP b-hairpin is, the PDB was mined using the mmCIF Keyword Search (Classification) "antimicrobial" and the hits were scanned visually for structural similarity. Shown in Figures S10D and S10E are 2 "b hairpin-like" AMPs that were found: entries 2LM8 (Cysteine Deleted analog of b-hairpin AMP Tachyplesin I in LPS, CDT-LPS) 101

| abb and bab folds
This group contains three confidently modeled AMPs. Human Histatin 2 104 and Garvicin A 73 were predicted to have abb folds, while for Rattusin, 74 a bab fold was predicted. The first largest cluster for Human Histatin 2 peptide contained 35.4% of the total number of models but fold matches were found for the centroid structures of the second and the third largest clusters only. Nevertheless, the centroid structure of the largest cluster is highly similar to that of the second, and since the third cluster centroid has particularly pronounced secondary structure elements, this model was taken forward as a representative structure ( Figure S4). Garvicin A modeling resulted in a small top cluster relative to the number of successfully modeled structures, and Rattusin fold was on the lower limit of 25% (Table 2, see Methods). Upon inspection of centroids of the remaining two largest clusters for Garvicin A and Rattusin, we found that these models displayed similar structures, confirming their reliability. In addition, we tested the stability of our mod-  Identity percentages were obtained through GESAMT. C1 size %-size of the largest cluster compared to the overall number of models. a C3 size %. b Modeled structure matching an experimental structure with different topology. in turn enhance its antibacterial effects. However, the peptide remains active even with the disulfides reduced, justifying a prediction of monomeric structure as performed here. It is well known that disulfide reduction can have a wide spectrum of effects from enhancement to total loss of activity (as seen in a and b-defensins, and u-defensins, respectively), alteration of selectivity (as seen in cryptidin-related sequences) and so forth. 46,74,91,105 Class II bacteriocins are a heterogeneous group of small heat-

| Helix-break-helix and helix-kink-helix continuum
By far the largest group of folds obtained was a continuum of vshaped, helix-turn-helix and helix hairpins ( Figure 6 and Table 3).
Attempts to consistently subdivide these into separate helix-breakhelix (containing helix-hairpins and v-shaped structures) and helix-kinkhelix groups (Figure 6), were unsuccessful.
For all of the structures in this group, the largest cluster contained high percentages of the total number of models, except for Andropin and Hymenochirin 2B. For these two peptides, we inspected the centroid structures of the remaining two largest clusters and found them to be similar to the largest cluster's centroid, which gave us confidence that all folds are modeled correctly. The stability of selected folds was tested by running 100 ns simulations in explicit solvent and performing clustering as described in Methods. Structure representatives of highly populated clusters superimposed are shown in Figure 7.
All of the peptides were modeled as monomers, including Cynthaurin peptide, which contains a single cysteine residue. 106 Cynthaurin is believed to predominantly form homodimers: however, both monomer and dimer are active against bacteria, whilst the monomer is nonhemolytic. Cynthaurin, along with Ponericin Q42 107 , Ceratotoxin D, 108 and CPF-B1 peptides, 109 shown in Figures 7A, 7D, and 7E, respectively, has a helix-hairpin fold. Although the orientation of helices fluctuated somewhat, they were stable throughout the simulation ( Figure   7). Latarcin 4a 110 ( Figure 7B) showed 2 stable conformations, a single a-helix and a helix-break-helix fold. Interestingly, the short N-terminus helix of Kassinatuerin 1 peptide 111 ( Figure 7C) rapidly unfolded in solution, while the C-terminal helix remained stable.
Helix-hairpins are formed by two antiparallel a-helices connected by a loop ( Figure 6A). The helices interact through hydrophobic sidechain interactions at the interface. V-shaped structures can be defined as either: (1) two non-parallel a-helices whose angles intersect at angles from around 458 to 1208 connected with a loop region ( Figure   6B, bottom panel) or (2) a helical structure extending throughout the peptide but containing a kink. Usually, cationic peptides longer than   It has been speculated that v-shaped helix-break-helix structures with strongly amphiphilic a-helix at the N-terminus only are likely to be functional through the carpet mechanism, while structures with N-and C-termini that are both strongly amphiphilic are more likely to act via the pore-forming mechanism. For the first set, helix-break-helix AMPs with a hydrophobic gradient spanning from N-to C-terminus, it has been suggested that the amphipathic N-terminal helix is responsible for interaction with the membrane, while the C terminus, because of its lack of amphipathicity, lies on the membrane and only has a minimal interaction with it. 112 For this particular type of structures, Dubovskii et al. 112 proposed molecular hydrophobic potential (MHP) plots to be effective in sorting structures by mechanism. However, due to a lack of experimental data on structure and mechanisms of action of helix-kinkhelix peptides, this method has not been particularly useful in gaining a clear picture for our modeled peptides and, for now, the structurefunction relationship for these AMPs remains unclear.

| Potential novel AMP folds
The results discussed hitherto relate to fold matches between the

| All-b folds
A total of three all-b folds modeled ( Figure 8) were scanned for fold matches using CLICK database search. The first, Cypemycin 113 ( Figure   8A) is a bacteriocin active against Gram-positive bacteria. Although it has no lanthionine bridges present, it has some of the structural features of lantibiotics such as dehydrated amino acids, two L-allo-isoleucines, and a modified C-terminal D-cysteine that forms a ring structure with an L-cysteine. Only the disulfide bond constraint, set for the two cysteine residues, was included in our model. Since the two modeled isoleucines (which were modeled as the closest available representation of L-allo-isoleucine) are solvent exposed, and the D-cysteine is the Identity percentages were obtained through GESAMT. C1 size %-size of the largest cluster compared to the overall number of models. a Disulfide connectivity known.

| Mixed ab folds
Upon performing CLICK database search, five mixed ab folds were found to have fold matches that met the size and Z-score criteria (see Methods). The first, Propionicin-F 117 ( Figure 9A) is an unmodified,  52 However, the short b-sheet does not pack against the helix in the same way as seen in our Rosetta model, leading to a much more elongated structure. In contrast, the topologically-distinct abb folds share a similar mode of interaction between helix and b-sheet as Propionicin-F suggesting that they may be structural analogues. Like many bacteriocins, Propionicin-F has a nanomolar activity against strains of its producer organism, Propionibacterium freudenreichii.
ABP-118a 118 shown in Figure 9B is a type IIb, unmodified, heat- Mutacin IV 119 ( Figure 9C) shows activity against the Mitis group of oral streptococci. The peptide was thought to be a type IIb bacteriocin, 119 but this has not been demonstrated unambiguously. 120  This group also contains more complex folds, such as chain A Blp1 peptide-another type IIb bacteriocin ( Figure 9D). The highly populated largest cluster (28.1% of models) obtained with disulfide connectivity supporting the biophysical plausibility of our modeled fold. However, the two are likely to have evolved independently: Pseudopilins are part of the type 2 secretion system found in Gram-negative bacteria, and GspI is known to be located at the pseudopilus base, interacting with the inner membrane components, 122

| All-a folds
Overall ten all-a folds modeled ( Figure 11) were scanned for fold matches using the CLICK database search, and only one was found to have fold matches, namely Vejovine peptide 123 ( Figure 11I).  Figure 12B).

ACKNOWLEDGMENT
We acknowledge BMSI (A*STAR) and National Supercomputing Center Singapore for computational support.