What can AlphaFold do for antimicrobial amyloids?

Amyloids, protein, and peptide assemblies in various organisms are crucial in physiological and pathological processes. Their intricate structures, however, present significant challenges, limiting our understanding of their functions, regulatory mechanisms, and potential applications in biomedicine and technology. This study evaluated the AlphaFold2 ColabFold method's structure predictions for antimicrobial amyloids, using eight antimicrobial peptides (AMPs), including those with experimentally determined structures and AMPs known for their distinct amyloidogenic morphological features. Additionally, two well‐known human amyloids, amyloid‐β and islet amyloid polypeptide, were included in the analysis due to their disease relevance, short sequences, and antimicrobial properties. Amyloids typically exhibit tightly mated β‐strand sheets forming a cross‐β configuration. However, certain amphipathic α‐helical subunits can also form amyloid fibrils adopting a cross‐α structure. Some AMPs in the study exhibited a combination of cross‐α and cross‐β amyloid fibrils, adding complexity to structure prediction. The results showed that the AlphaFold2 ColabFold models favored α‐helical structures in the tested amyloids, successfully predicting the presence of α‐helical mated sheets and a hydrophobic core resembling the cross‐α configuration. This implies that the AI‐based algorithms prefer assemblies of the monomeric state, which was frequently predicted as helical, or capture an α‐helical membrane‐active form of toxic peptides, which is triggered upon interaction with lipid membranes.

amyloids are highly polymorphic, even within the same sequence, with the extreme possibility of a different secondary structure at the fibril form. 2,3,15,23,24This latter observation has been demonstrated by recent structures of amyloidogenic AMPs. 14,15,18The self-assembly properties and polymorphism of amyloids distinguish them from globular and membrane proteins, which constitute the majority of available protein structures. 25This makes amyloid structure determination and prediction difficult, [25][26][27][28][29] while structural information remains necessary for understanding their functions and mechanisms, discovering modulators of their activities, and designing novel amyloid sequences for biomedical and technological applications.
With the recent development of new methods for predicting protein structure, such as AlphaFold, there is potential for predicting amyloid structure.In this challenging field, we investigated the current capabilities of these methods.AlphaFold2 (AF2) was introduced at the 14th Critical Assessment of Protein Structure Prediction (CASP) in 2020 and uses innovative neural network architectures and training methods. 30The training of AF2 is based on the evolutionary, physical, and geometric constraints of experimental three-dimensional protein structures deposited in the Protein Data Bank (PDB) to predict the structure of proteins or their complexes from sequence. 30To make the AF2 algorithm more accessible, a collaborative "Jupyter notebook" hosted by Google called ColabFold has been established. 31This platform allows researchers without computational expertise or resources to use the AF2-ColabFold algorithm to model the structures of protein monomers, homo-and hetero-oligomeric complexes. 31 this study, our aim was to evaluate the capability of AF2-ColabFold-multimer in accurately predicting the structures of a new class of amyloidogenic AMPs.These AMPs, namely PSMα3, uperin 3.5, and aurein 3.3, have recently been characterized at a high resolution and found to form cross-α and cross-β amyloid fibrils. 2,3,14,15Additionally, we examined a specific fragment (residues 17-29) of LL37, a human antimicrobial and immunomodulator, which assembles into a distinct α-helical fibril structure that is non-amyloid in nature. 17Furthermore, we included models of AMPs without experimentally determined structures but with known morphological features, such as citropin 1.3, cyanophlyctin, bombinin H4, and dolabellanin-B2. 18Finally, our analysis also encompassed two well-known pathological amyloids, namely amyloid-β (Aβ) and islet amyloid polypeptide (IAPP).[34][35][36]

| RESULTS
Table 1 lists the sequences of the peptides analyzed, and the number of subunits tested.Table S1 lists the key features of the leading AF2-ColabFold predicted models.

| AF2-ColabFold models partially recapitulate the mated α-helical sheets of the cross-α amyloid fibril of bacterial PSMα3
PSMα3 is a cytotoxic and lytic 22-residue peptide of the PSM family secreted by Staphylococcus aureus.The crystal structure of PSMα3 (PDB id: 5I55) revealed, for the first time, a cross-α fibril, a unique amyloid morphology composed on amphipathic α-helices. 2,3The cross-α is an assembly of molecules stacked perpendicular to the fibril axis, further forming tightly mated sheets, as in the canonical cross-β amyloid, but with each molecule forming α-helices rather than β-strands.Recently determined cryo-EM structures of PSMα3 (PDB id: 7T0X, 7SZZ) confirmed the cross-α configuration and further showed a supramolecular assembly of the mated α-helical sheets into nanotubes. 37Notably, the cross-α configuration was also observed for the longer, 44-residue PSMβ2, which forms a helix-turn-helix motif, demonstrating another possibility for mated helical sheet-based assembly. 37To compare the AF2-ColabFold models with the crystal and cryoEM structures, we evaluated the modeled monomer, decamer (10-mer), pentadecamer (15-mer), eicosamer (20-mer), 25-mer and 30-mer PSMα3.
T A B L E 1 Tested peptide sequences and AF2-ColabFold multimeric models analyzed.

Protein Sequence
Number of subunits Note: Names, sequences, and number of subunits used as input in various AF2-ColabFold experiments."V" indicate experiment that has been conducted.

AF2-ColabFold prediction for monomeric PSMα3 yielded an
α-helical structure with a high confidence measure, namely the predicted local-distance difference test (pLDDT) score, for all models (Figure S1a).For multimer predictions, the closest model with similarity to the experimental cross-α structure was the first ranked model of the decameric (10-mer) PSMα3.This model showed two mated parallel α-helical sheets, as in the experimental structures, yet the α-helices and mated sheets were not as tightly packed (Figure 1a,c).Specifically, in the crystal structure, the distance between α-helical subunits along each sheet is 10.5 Å, and the distance between mated sheets, orthogonal to the fibril axis, is 12.6 Å. 2,3 In the AF2-ColabFold 10-mer model, the corresponding distances are 12.8-16.6Å between α-helices and 16 Å between sheets, resulting in a more loosely packed fibril (Figure 1c), which lacks most of the interhelical polar interactions observed in the experimental structures.In both the experimental and predicted structures, the amphipathic α-helices are arranged so that the interface between sheets is hydrophobic, forming a large hydrophobic core along the fibril (Figure 1b,d).To quantify the tightness of the packing, we compared the solvent accessible surface area (SASA) buried within the fibril.Values for all SASA and interhelical and inter-sheet distances are presented in Table S1.In the PSMα3 crystal structure, the buried surface area of a single α-helix is 1150Å 2 , representing 50% of its total SASA, 2,3 compared to 622Å 2 (24% of the total SASA) in the 10-mer model.Overall, for a multimer of 10, AF2-ColabFold succeeded in predicting the general configuration of mated α-helical sheets, but with looser packing of the helices compared to the experimentally determined structures, which present an apparently much less stable configuration.Notably, although the pLDDT of all five models is similar and all individual subunits were predicted as α-helices, the four models ranked second to fifth did not resemble the cross-α configuration, and included a barrel and unlikely assemblies with clashes between subunits (Figure S2d,e).
Another AF2-ColabFold model that resembles the cross-α configuration is that of a pentadecamer (15-mer), and in particular, the second ranked model, which shows two mated parallel α-helical sheets with a hydrophobic core, but again, not as tightly packed as in the experimental structures (Figure 2a).Specifically, the distances between the α-helices range from 13.7 Å to 18.8 Å, and between the sheets from 14 Å to 19.5 Å. Corresponding to the large distance between subunits, the SASA of a central α-helix was 361Å 2 (14% of the total SASA), which is much lower (less is buried) compared to the experimental structure.Although all five models share the same pLDDT score, the first ranked model of the 15-mer also showed two mated α-helical sheets but with several subunits detached from the assembly.The third and fifth ranked models showed an arrangement resembling a globular shape, and the fourth ranked model resembled a barrel (Figure S3).
The second ranked model of the eicosamer (20-mer) PSMα3 showed a 12-subunit core of mated sheets with six peptides from each sheet, with distances of 14-19.6Å between α-helices along the sheet, and 15.4-19.5 Å between the two sheets.The buried SASA of one helix was 466 Å 2 (18%), indicating a packing that is looser than the 10-mer but tighter than the 15-mer.The rest of the subunits were scattered around this core.This suggests that adding more subunits surprisingly predicts disassembly of the fibril-like arrangement compared to the 10-and 15-mers.The other ranked models showed diverse assemblies of α-helices, including helical sheets, spirals, and barrel-like helical clusters that are not very tightly packed (Figure S4).
Adding more subunits, of 25-and 30-mers, leads to further deviation from the cross-α configuration, showing different shapes that attempt more globular-like clusters, but with loose packing of the helices compared to the cross-α tight fibril packing (Figures S5 and S6).
Overall, the predictions of PSMα3 multimers show a pLDDT score around 40 on a scale of 0-100, indicating low confidence and structural diversity.Some of the models predicted mated α-helical sheets similar to the cross-α fibril arrangement, but with a looser packing compared to the experimental structure and often with surrounding detached subunits.The supramolecular nanotube structure shown by the cryoEM structure of PSMα3 37 was not recapitulated, even by modeling of a larger number of subunits.It is noteworthy that in some of the predicted models, the PSMα3 α-helixes spiral, echoing the structures shown by Zhang and co-workers using designed peptides. 382 | AF2-ColabFold models of uperin 3.5 recapitulate some cross-α features, and fail to predict the cross-β configuration Uperin 3.5 is an AMP secreted on the skin of Uperoleia mjobergii (Australian toadlet).This peptide displays a secondary structure switch and can form both cross-β and cross-α amyloid fibrils, both of which have demonstrated by high-resolution structures and biophysical measurements.14,15 The cross-α crystal structure (PDB id: 6GS3) 15 resembled the PSMα3 fibril with mated sheets of stacked α-helices, but while the latter showed a parallel orientation of the α-helices along the sheets, uperin 3.5 showed an interesting feature of an antiparallel orientation of the α-helices 2,15 (Figure 3a,b).This revealed a surprising feature of parallel and antiparallel sheets, formed not only by β-strands, but also by α-helices.The cross-β form of uperin 3.5 was determined by cryo-EM (PDB id: 7QV5), showing a three-blade symmetric propeller of nine peptides per fibril layer including tight β-sheet interfaces 14 (Figure 3e).This revealed a remarkable ability of a natural sequence to switch between different secondary structures depending on the environmental conditions.In particular, lipids mimicking a membrane environment induced the transition into the cross-α form.15,[39][40][41][42] To compare the AF2-ColabFold models with the crystal and cryoEM structures, we evaluated the modeled monomer, decamer (10-mer), eicosamer (20-mer), 25-mer, and 30-mer uperin 3.5.
The AF2-ColabFold prediction for monomeric uperin 3.5 is an α-helical structure in all five ranked models, with an average pLDDT score of 90 (Figure S7).All models of the decamer (10-mer), with an average pLDDT score of 40, consisted of clusters of helices reassembling a globular fold (Figure S8).One of the subunits in the fourth ranked model of the 10-mer showed a random coil with no determined secondary structure (Figure S8).The five models of the 20-mer uperin 3.5 also showed diverse arrangements despite having similar pLDDT scores.The second and third ranked 20-mer models showed some remote resemblance to spiraling α-helical mated sheets (Figure S9).
Starting with 25-meric subunits, a higher similarity to cross-α was observed.The third ranked 25-meric uperin 3.5 shows two mated sheets of α-helices, one with eight subunits and the other with seven (Figure 3), while the rest of the α-helices are scattered around (not shown).The sheets contain α-helices parallel to each other, as in PSMα3, 2 and not antiparallel as in the crystal structure of uperin 3.5. 15The antiparallel orientation of the uperin 3.5 α-helices allowed for a tight packing, with an interhelical distance of 9.8 Å and intersheet distance of 8.6 Å (Figure 3a).In the model, the distances are much larger, with interhelical distances of 15 Å in one sheet and 14.5-19 Å in the other, and inter-sheet distances of 13-18.6Å (Figure 3c).Accordingly, while the buried SASA of one α-helix in the crystal structure is 752Å 2 (40% of the total SASA), in the model, it is only 194 Å 2 (10%) for a representative helix, highlighting the much looser packing of the model.With all 25-mer models having the same averaged pLDDT score of 40, they showed diverse arrangements including clusters of helices in globular or spiral shapes (Figure S10).
An interesting observation is that the top-ranked model for the 30-meric uperin 3.5 showed α-helical mated sheets that were curved and spiraling in a parallel orientation (Figure 4).In this model, each sheet has an equal number of subunits and is more tightly packed than the 25-mer model, but not as tightly as the crystal structure.The model can be divided into two regions, with one section consisting of straight sheets and the other of spiraling sheets.The straighter part showed interhelical distances of 14.6 Å in one sheet and 12.5 Å in the other, and an inter-sheet distance of 12 Å (Figure 4a).The buried SASA of a representative helix is 454 Å 2 (24% of the total SASA).
The spiraling part had interhelical distances of 14.4 Å in one sheet and 14.6 Å in the other, and an inter-sheet distance of 12.3 Å (Figure 4b).The buried SASA of one representative helix is 363 Å 2 (19% of the total SASA).Importantly, the sheets are oriented to maintain a hydrophobic core between the sheets.All 30-mer models have the same averaged pLDDT score of 40, and they showed diverse arrangements including clusters of helices in globular or spiraling shapes (Figure S11), similar to the 25-mer models.In the fourth-and fifth-ranked 30-mer models, some of the subunits form random coils with no defined secondary structure.Notably, none of the AF2-ColabFold models we evaluated predicted an antiparallel orientation of the helical sheets (Figure 3a).Furthermore, the predictions did not anticipate the cross-β configuration, which is formed by a threeblade symmetric propeller of nine peptides per fibril layer including tight β-sheet interfaces (Figure 3e).α-helical and β-rich conformations. 18The cryo-EM structure of aurein 3.3 revealed a cross-β fibril consisting of six peptides per fibril layer, all with kinked β-sheets allowing for a rounded compactness of the fibril 14 (Figure 5e).4][45][46][47][48] We modeled the monomer, decamer (10-mer), eicosamer (20-mer), 25-mer and 30-mer and 40-mer aurein 3.3.
The predicted monomeric aurein 3.3 was α-helical, with a high pLDDT score of 90 for the first 13 of 16 residues (Figure S12).The multimer predictions were all α-helical subunits and failed to predict the β-rich configuration.The decamer (10-mer) models, which had a similar pLDDT score, showed a clustering of helices into a globularlike shape with clashes between subunits (Figure S13).From 20 subunits and up, the models were better packed into intriguing fibrils composed of α-helices.The first ranked model of the eicosamer (20-mer) showed 16 subunits arranged in two α-helical sheets, resembling cross-α, but with a twist between helices and curved sheets, so that the helices along each sheet are not strictly parallel or antiparallel to each other, resulting in clashes between atoms (Figure S14d,e).The second ranked model showed a core of eight helices, similar to the spiraling cross-α amyloid-like structures 38 (Figure 5a).Within this region, the averaged distances between α-helical subunits along the two mated sheets were 12.7 and 12.6 Å, and the averaged distance between sheets was 14.7 Å.The buried SASA of a representative helix was 811Å 2 (43.5% of the total SASA), indicating tight packing, similar to the cross-α structure of PSMα3.This arrangement also recapitulated the hydrophobic core between the sheets (Figure 5b).The with two examples of 722 and 150Å 2 (38.6% and 7.8% of the total SASA, respectively).In the fourth and fifth ranked models, the buried SASA for a single helix was 183Å 2 (9.8%) and 171Å 2 (9.1%), respectively.Overall, these spiraling curved fibrils are not as tightly packed as the cross-α crystal structures of PSMα3 or uperin 3.5, but share the hydrophobic core between the helical sheets (Figure 6b,d; Figure S15g).The third ranked model of the 25-mer showed a different, rounded shape (Figure S15).Surprisingly, in the first ranked 25-mer model, some of the subunits were random coils, while its average pLDDT score was higher than the other models (Figure S15).The models of the 30-mer aurein 3.3 recapitulated some of the features of the 10-and 25-mer models (Figure S16).Overall, the AF2-ColabFold predictions for aurein 3.3 were mostly similar to PSMα3 and the spiraling cross-α of designed peptides, 2,3,38 along with more extreme curving and spiraling of the fibril itself.There was no evidence of kinked β-sheets as determined by cryo-EM (Figure 5e).LL37 is a host defense peptide cleaved from hCAP18, the only human cathelicidin. 49,500][51] The LL37 fragment containing residues 17-29 is an active antibacterial core and the shortest fragment that retains antiviral activity. 12,52The crystal structure of LL37(17-29) (PDB id: 6S6M; Figure S17a,b) revealed a unique self-assembly of amphipathic helices into a densely packed, elongated hexameric fibril with a central pore.
The fibril is composed of four-helix bundles with a hydrophobic core that are further assembled through a network of polar interactions. 17 evaluate the predictive ability of AF2-ColabFold for this unique fibril structure, we modeled the monomer, decamer (10-mer), eicosamer (20-mer), and 30-mer LL37 (17-29).

| Amyloid-forming AMPs without experimental structures
We have previously identified and characterized AMPs that selfassemble into cross-α and cross-β amyloids, some with the ability to switch conformations over time or induced by lipids. 18These include the amphibian citropin 1.3, cyanophlyctin, and bombinin H4, 18,[53][54][55] as well as dolabellanin-B2 secreted by sea hares. 18,56We selected these four peptides because they have different structural properties. 18While cyanophlyctin is unstructured in solution, also in the presence of lipids, it adopts a cross-α conformation in the fibril form, and its helicity is further induced by the presence of membrane lipids.
Citropin-1.3 and bombinin H4 are unstructured in solution but adopt a helical conformation in the presence of membrane lipids.In the fibril state, citropin-1.3 forms a cross-β structure, but the presence of lipids induces the helical state.Bombinin H4 forms a cross-α fibril, and its helicity is also promoted by lipids.In contrast to most AMPs tested, which are unstructured or helical in solution, dolabellanin-B2 adopts a β-rich conformation in solution and in the fibril form, with a minor helical population in the fibril form, and is insensitive to the presence of lipids.Considering these biophysical properties, 18 we evaluated the AF2-ColabFold models of the monomeric and multimeric forms of these AMPs (Figure 7; Figures S21-S41).
All monomeric models of the four peptides were α-helical with relatively high pLDDT scores, with dolabellanin-B2 showing a helix-turn-helix prediction (Figures S21, S26, S32, and S37).There was no evidence for a β-rich structure for any of these AMPs in the monomeric or any of the multimeric models described below (Figures S21-S41), even for dolabellanin-B2 that is predominantly β-rich, even in solution and with lipids. 18Among all AF2-ColabFold models of these peptides, three recapitulate the continuous α-helical mated sheets, one for citropin 1.The second ranked eicosameric (20-mer) model of bombinin H4 contained clashes between atoms although the packing is rather loose (Figure 7c).Specifically, the averaged interhelical distances along the two sheets were 13.4 and 13.9 Å, and the averaged inter-sheet distance was 20.7 Å.The buried SASA of one helix is 146.1 Å 2 (21.4% of the total SASA).The third ranked 25-mer bombinin H4 model also showed a loose packing, with averaged interhelical distances along the two sheets of 14.6 and 14.9 Å, and an averaged inter-sheet distance of 19.4 Å (Figure 7e).The buried SASA of one helix is 100.2Å 2 (4.7% of the total SASA).In all three models shown, the hydrophobic pattern indicates a hydrophobic core and a hydrophilic surface similar to the experimentally determined cross-α structures (Figure 7b,d,f).

| Models of human pathological amyloids with antimicrobial activity, Aβ and IAPP
Human amyloids are notorious for their involvement in neurodegenerative diseases such as Alzheimer's and Parkinson's, in which protein plaques accumulate in the brain.Alzheimer's-associated Aβ is a 42 (or 40) residue long peptide derived from amyloid precursor protein. 57IAPP is a 37-residue peptide associated with Type 2 diabetes, present in various organs, most commonly in the pancreas, and is usually associated with insulin, and involved in glucose metabolism and homeostasis. 58Both IAPP and Aβ are unstructured proteins that can form cross-β amyloids. 59,602][63] IAPP and Aβ demonstrate antimicrobial activity, 32,33,35,36 thereby exhibiting functional similarities with the AMPs.Their characteristics, including their short length, the display of both α-helical and β-rich states, and their antimicrobial activity, make IAPP and Aβ optimal candidates for assessing the predictions of AF2-ColabFold.We conducted an evaluation of the AF2-ColabFold models of these human amyloids in comparison to the models of the AMPs.
The AF2-ColabFold monomeric model of Aβ1-42 is unstructured, as previously reported. 60The pLDDT score of the monomeric Aβ is lower than that of any of the other monomeric models tested here (Figure S42), perhaps due to its greater length.In the model of decameric (10-mer) Aβ, most of the subunits are also unstructured, but in the first, second, and fourth ranked models, some regions are α-helical, and in the third and fifth ranked models some regions form short β-sheets, but most of them consist of clashing β-strands (Figure S43).
The models of eicosamer (20-mer) Aβ showed a similar trend, with some models partially structured as helices (Figure S44).

| Cross-α architecture is rare among proteins with experimental structures
8][69][70] This supports the rarity of α-helices stacked perpendicular to the fibril axis in the databases.

| Secondary structure switchers are rare among proteins with experimental structures
The secondary structure switch in the fibril form, demonstrated at high resolution for uperin 3.5 14,15 and suggested for additional amyloid-forming AMPs based on secondary structure measurements, 18 may be one of the reasons hindering correct modeling of amyloid fibril structures.We sought to analyze the prevalence of secondary structure switchers among proteins with known experimental structures.Specifically, we searched the PDB for pairs of highly similar protein chains, as described in Section 4, which contain helical structure in one chain and extended secondary structure in the other chain for at least half of their length.
The analysis yielded only five proteins that have PDB structures with two different secondary structures (Table S2): human Aβ, prion protein, and glucagon, amphibian uperin 3.5, and bacterial cold shock protein CspA.Except for the latter, the resulting structures are all from proteins that form amyloids, although this feature was not specified in the search parameter.The structure of CspA was determined as a folded protein containing β-sheets and as a cotranslational helical folding intermediate (Figure 8), which explains the differences in structure.For human Aβ, prion protein, and glucagon, the differences arise from comparing the α-helical form of the protein in its soluble monomeric or oligomeric state, with the cross-β fibril form (examples are provided in Figure 8 and full list of PDBs is given in Table S2).Uperin 3.5 is the only example in the PDB, as far as the analysis could identify, where the same sequence shows both helical and extended conformations in the fibril form.This demonstrates the rarity of secondary structure switchers, at least among proteins with known structures, which may have influenced AF2 training.Notably, for IAPP, there are around 900 sequences in the alignment, but more than half of these have less than 50% identity.For Aβ, aurein 3.3, and PSMα3, the coverage is approximately 80, 5, and 4 homologous sequences, respectively, with three-quarters of these sequences having more than 80% identity.All other proteins tested have a sequence coverage of only 2-3, with 100% identity.

| DISCUSSION
Our research focused on functional amyloids with antimicrobial activity, including peptides secreted by a variety of organisms, ranging from prokaryotes to eukaryotes, as well as two human amyloids known to be involved in diseases that also display antimicrobial activity.Although many amyloids play important physiological roles in various organisms and are expected to be subjected to evolutionary selection pressure, they are fundamentally different from globular and membrane proteins, which make up most of the available protein structures. 25Amyloid-forming proteins and peptides are often unstructured in solution and highly polymorphic when assembled, with their morphology frequently affected by environmental factors such as pH, salts, lipids, metal ions, nucleic acids, post-translational modifications, other proteins, and other factors. 79An extreme case of polymorphism is different secondary structure in highly similar sequences.Our analysis of the PDB has shown that among all protein structures, only one bacterial cold shock protein and four eukaryotic amyloids exhibit complete secondary structure polymorphism, subject to our definition.Such massive switch was observed either when comparing soluble to fibrillar structures in the case of human Aβ, prion and glucagon, or two fibrillar forms in the case of uperin 3.5 (Table S2; Figure 8).This presents an extremely small group, a water-drop in the sea of all known structures used for the training of AF and similar methods.Furthermore, the training of the AF2 model is primarily based on low-order oligomers and may be less applicable to higher-order oligomers and fibrils, in addition to the general lack of confidence in oligomers relative to monomers. 80In this study, we examined how these biases in training sets and the inherent polymorphism and self-assembly properties affect amyloid structure prediction using advanced artificial intelligence (AI)-based methods, particularly AF2 via the ColabFold algorithm. 31While discussing the results, it is important to take into consideration the estimated prediction accuracy described in Section 2.
A significant observation is that most of the predictions reported here were based on α-helical structures, and did not encompass any β-rich assemblies, even though our test cases included uperin 3.5, aurein 3.3, Aβ, and IAPP, for which cross-β structures were observed experimentally, 14,58,81 and such assemblies are a hallmark of amyloid fibrils.Specifically, the AF training set comprised 155 204 PDB entries released prior to August 28, 2019. 30When we compared these structures with the AmyPro database, a repository of amyloid proteins documented in scientific literature, 82 we identified 77 β-rich amyloid structures (including redundant sequences) (0.05% of the training set) and one cross-α structure of PSMα3 (PDB id: 5I55; 0.00064% of the training set).It is difficult to ascertain the extent to which a single or a few dozen structures could influence the AF2 algorithm and its accuracy.However, the growing presence of amyloid polymorphic structures in the PDB offers hope that future algorithms will be able to identify these unique features.Another consideration regarding the accuracy of prediction is the short length of the sequences tested here.Among the 270 304 chains of structures in the PDB that were released prior to August 28, 2019, 32 125 (11.9%) of those are shorter than 42 amino acids, which is our longest tested peptide (Aβ).
Although this proportion is not particularly high, it is certainly not negligible.Another important factor to consider in the analysis is the ratio of helical structures to β-rich structures in the training set.A statistical analysis of the PDB, using the structural classification of proteins (SCOP), revealed that among the structures released before August 28, 2019, there were 18 881 proteins primarily composed of α-helices and 29 123 proteins mainly composed of β-strands (Table S3).Therefore, it is unlikely that the training set would have a disproportionate representation of helices, both in general protein structures and specifically among amyloids.Alternatively, there could be other factors contributing to the significant inclination toward α-helical models in the predicted structures.One plausible explanation for the predicted helical configuration of the multimer could be its influence by the predictions of the monomeric or soluble states, which for some of the cross-β amyloids, are either unstructured or helical.Another potential explanation could be the functional significance inherent in the protein structures, given that the membrane-active form is frequently helical, as is the oligomeric form of several amyloids.
Both PSMα3 and uperin 3.5 were predicted by AF2 to exhibit cross-α-like mated helical sheets (Figures 1-4), aligning with their observed formation of cross-α fibrils in experimental structures.However, there is a nuance: PSMα3 predominantly adopts a helical conformation, 2,83,84 whereas uperin 3.5 tends to form cross-β fibrils unless external factors, such as lipids, induce its helicity. 15Additionally, uperin 3.5 and PSMα3 exhibit contrasting helical orientations along their sheets.Uperin 3.5 has an antiparallel orientation, while PSMα3 is parallel.This distinction is further highlighted by their differing thermostabilities.Uperin 3.5 demonstrates greater stability, potentially due to its more stable alternate cross-β conformation. 15,85Such differences likely originate from their respective sequences, as indicated by the divergent outcomes of our AF2 prediction.It is noteworthy that when multimeric predictions involved over 20 subunits, the predicted PSMα3 models started to unfold into sparse helices (Figures 3 and 4; Figures S2-S6).Conversely, uperin 3.5 only began to display fibril-like mated helical sheets when modeled as a 25-mer or larger (Figures 3 and 4).To comprehensively understand these variations in multimeric order, which impact the assembly and dispersion of helices, further experimental cross-α structures are essential.
In the case of AMPs without known experimental structures, the models did not predict a β-rich form, even though experimental data showed that three out of the four tested AMPs had cross-β structures, and dolabellanin-B2 even displayed a β-rich propensity in solution (Figures 7; Figures S21-S41).However, interestingly, the most tightly packed model of mated α-helical sheets that resembled the cross-α configuration among all tested peptides and multimers was the 20-mer citropin 1.3 (Figure 7).This amphibian AMP was shown by biophysical measurements to assumes a β-rich fibril configuration but can be induced into an α-helical conformation in the presence of membrane lipids in solution and in the fibril form, 18 similar to uperin 3.5. 15 addition to the observed α-helical sheets that resemble the cross-α configuration, we also frequently found predictions of α-helices clustered in a globular-like shape, particularly in the 10-mer models of numerous peptides (Figure 5; Figure S5, S6, S9-S11, S13, S18, S20, S22, S24, S25, S27-S30, and S33-S36).This phenomenon might be attributed to a bias in the training set, which is predominantly composed of globular proteins.Other recurrent shapes included spiraling helices and doughnut-like assemblies; however, these were often loosely packed and unlikely to form stable assemblies.
We observed an overall limitation in AF2's prediction of amyloid multimers, with an unexpected preference for mated α-helical sheets.
Previous studies showed other types of proteins, whose AF models differed from the experimental structures, included intrinsically disordered regions (IDRs) and membrane proteins. 86,87Additionally, limitations in structure prediction were reported for some proteins that bind essential compounds for their structural integrity and function, such as heme groups, zinc ions, and metal ions.Consequently, tools like AlphaFill were developed to enrich AF models with ligands and co-factors. 88The polymorphism of amyloid fibrils might also mirror the challenges in predicting the structural dynamics of proteins undergoing allosteric-induced large conformational changes.AF generates a few models that can potentially sample structures at different locations on the energy landscape, but not necessarily capture all allosteric conformations or provide a low reliability score. 86Another factor leading to low confidence scores of AF2 models is the small number of homologous sequences.To address this issue, a new deeplearning-based algorithm, AlphaFold-Eva, was developed to evaluate AF predictions for proteins with limited homologous sequences by learning geometry information from complex structures. 89A general limitation of AI-methods trained on structural information alone may be related to the "epigenetic dimension of protein structure," a term that encompasses all environmental parameters affecting structures beyond the amino acid composition. 90Overall, there is still room for improvement in multimeric predictions, 91 yet future developments in this field are promising, as suggested by a recent report demonstrating that AlphaFold-Multimer successfully predicts homomeric and heteromeric interfaces. 80 conclusion, the predictions made by AF2-ColabFold-Multimer for amyloid peptides with antimicrobial properties did not align with the more common cross-β amyloid configuration.Instead, they predominantly favored α-helical structures, including paired α-helix sheets with a hydrophobic core, akin to the cross-α configuration.
Prior to initiating the structure prediction of amyloidogenic AMPs, we anticipated limitations due to their self-assembling, highly polymorphic nature, short sequence length, and the absence of homologous sequences.Surprisingly, the ability of AF2 to predict cross-α-like configurations was a positive outcome, especially given that the training set included 77 β-rich amyloids compared to a single cross-α structure, and a higher number of all-β proteins compared to all-α proteins (Table S3).Notably, the models did not predict the more common cross-β amyloid configuration, which may suggest that this preference for helices stems from the prediction of the monomeric/soluble state.
We hypothesize that the training of AI-based algorithms may enable the identification of sequence-related information associated with evolutionarily driven active states, potentially leading to a more accurate simulation of the membrane-active α-helical form of many of these toxic peptides.

| Structure prediction
To predict the 3-dimensional structures, we used the AlphaFold collaborative "jupyter notebook" hosted by google called ColabFold, 31 which allows the prediction of protein monomers, homo-and heterooligomeric complexes. 31We tested 10 different peptides, six with known atomic structure in fibril form deposited in the PDB and four with no experimental structure, but for which the secondary structure has been investigated using circular dichroism (CD), attenuated total internal reflection Fourier transform infrared spectroscopy (ATR-FTIR), and x-ray fiber diffraction. 18We used the default parameters of ColabFold, and varied the number of subunits, as described in Table 1.To assess the accuracy of each prediction, we employed the

| Structure visualization and analysis
Visualization and analysis of the predicted and known structures were performed using UCSF ChimeraX version 1.4. 92Interhelical and intersheet distances were calculated by first determining the center of mass of each subunit using the "measure center" command, and then using the "distance" command between selected centers of mass.To quantify the compactness of the packing, we compared the SASA buried within the fibril, calculated using the "measure buriedarea" command in ChimeraX.We selected a subunit located at the center of the cross-α assembly and measured the area buried within the multimer complex.The percentage of buried area was calculated from the total surface area of the selected chain, rounded to one decimal place.
Values for all SASA and interhelical and inter-sheet distances are presented in Table S1.

| Search for structures resembling the cross-α configuration of PSMα3
Different helical packings derived from the PSMα3 cross-α structure (arrangement of 4-6 helices in the same or mated sheets) were searched for structural similarity using TopSearch, 93 which allows structure alignment against all PDB entries released before April 18, 2018. 934 | Searching the PDB for identical sequences with different secondary structures All entries in wwPDB were downloaded from the PDB archive in mmCIF format.Entries were filtered for the following criteria: polypeptide (L) chain type, at least 70% of residues resolved in the atomic structure, and at least half of the modeled residues assumed a helical or extended secondary structure, as determined by the publisher of the structure, or automatically assigned by DSSP. 94Each chain that passed the filtering was classified, and 66 985 helical chains and 9409 extended chains were found.Each extended chain was matched to the helical chains using Diamond 95 BLASTP with minimum sequence identity of 80% and minimum coverage of 80% for both chains.A total of 74 matches were found, including 1 pair of CspA, 2 pairs of uperin 3.5, 14 pairs of Aβ, 14 pairs of glucagon, and 43 pairs of human prion.The exact pipeline and code are available at https://github.com/GabiAxel/pdb-similar-sequence-different-secondary-structure.
The results are given in Table S2.

F I G U R E 1
Comparison between experimental and predicted structures of PSMα3.(A,B) The crystal structure of PSMα3 (PDB: 5I55).(C,D) The AF2-ColabFold first ranked model of a decamer (10-mer).In panels (A,C), the view is along the fibril axis, with PSMα3 colored by sheet, with marked distances between sheets and between α-helical subunits along the sheet.In panels (B,D), PSMα3 is shown in a surface representation colored by hydrophobicity according to the scale bar.
ColabFold second ranked model of pentadecameric (15-mer) PSMα3.(A) PSMα3 model is viewed along the fibril axis and colored by sheets, showing distances between subunits and sheets.(B) Surface representation colored by hydrophobicity as indicated by the scale bar.

F I G U R E 3
Comparison between experimental and predicted structures of uperin 3.5.(A,B) The crystal structure of uperin 3.5 (PDB: 6GS3).(C,D) The AF2-ColabFold 3rd 25-meric ranked model.(E) The cryo-EM cross-β structure (PDB: 7QV5).In panels (A-D), the view is along the fibril axis.For clarity, only the core arrangement of 15 helices is shown, without the scattered subunits present in the model.In panels (A,C), uperin 3.5 is colored by sheets, and the distances between sheets and between two antiparallel and parallel neighboring subunits along the sheet are marked.In panels (B,D), uperin 3.5 is shown in a surface representation colored by hydrophobicity, as indicated by the scale bar.In panel (E), the view is along the fibril axis with a slight tilt, with uperin 3.5 colored by β-sheets.2.3 | AF2-ColabFold failed to predict the kinked cross-β fibril of aurein 3.3 and instead showed arrangements of spiraling and curved α-helical mated sheets Aurein 3.3 is an amphibian AMP isolated from the Litoria/Ranoidea raniformis (southern bell frog).Like uperin 3.5, aurein 3.3 showed a secondary structure switch in the fibrillar form to include both third ranked model, similar to the fifth ranked model, showed pairs of mated spiraling α-helical sheets curved into a doughnut shape, retaining the core between the paired sheets (Figure5c,d).The 20-mer models differed from each other and showed a similar pLDDT score of 40 (FigureS14).Models with a higher number of aurein 3.3 subunits (25-, 30-and 40-mers) supported the spiraling and curving cross-α into a spiraling F I G U R E 4 AF2-ColabFold prediction of the first-ranked model of the 30-mer uperin 3.5.Only the core arrangement of 22 helices is shown, excluding eight scattered subunits that exist in the model but are not shown for clarity.(A,B) Two orientations of the model are shown, colored by sheets, indicating distances between subunits and sheets.The bold dotted line indicates the transition from aligned to spiraling subunits.(C) Surface representation colored by hydrophobicity, as indicated by the scale bar.fibril rather than the typical elongated fibril (the 25-mer models are shown in Figure6).The second, fourth, and fifth ranked models of the 25-mer aurein 3.3 showed an average inter-sheet distance of 13.6, 21, and 15 Å, respectively.The distance between subunits along the two sheets was 17.4 Å and 16.5 Å for the second ranked model, 16.4 and 15.9 Å for the fourth ranked model, and 18 and 18.6 Å for the fifth ranked model (Figure6a,c; FigureS15f).The buried SASA for a single helix in the second ranked model varied along the structure,

F I G U R E 5
Comparison between experimental and predicted structures of aurein 3.3.(A-D) AF2-ColabFold models of 20-mer aurein 3.3, including the spiraling cross-α core of the second ranked model (A,B) and the third ranked model (C,D).The structures are either colored by sheets, showing the distances between subunits and sheets (A,C) or shown in a surface representation colored by a hydrophobicity scale as indicated (C,D).(E) The cryo-EM cross-β structure of aurein 3.3 (PDB: 7QV6) is shown along the fibril axis the β-sheets in different colors.2.4 | AF2-ColabFold failed to predict the nonamyloid α-helical fibril structure of human LL37 (17-29)
3 and two models of bombinin H4.None of the models of cyanophlyctin, tested up to 40-mers (Figures S26-S31), or dolabellanin-B2, tested up to 30-mers (Figures S37-S41), showed a fibril-like form or mated sheets.The first ranked eicosameric (20-mer) model of citropin 1.3 resembled the parallel cross-α structure of PSMα3 and was even as tightly packed (Figure7a).Specifically, the averaged distances between α-helical subunits along each of the two sheets were 10.7 and 10.1 Å, respectively, and the averaged distance between the sheets was 11.4 Å.The buried SASA of one helix was 944.3Å 2 (55.3% of the total SASA), thereby showing the most densely packed model among all peptides evaluated here.The addition of citropin 1.3 subunits (in the 25-mer and 30-mer) resulted in the loss of the fibril-like assembly (FiguresS24 and S25), similar to the case of PSMα3.Nevertheless, we observed a smaller assembly of three-helical bundles forming a triangular shape with a hydrophobic core, within the second ranked model of the 20-mer and fifth ranked model of the 25-mer citropin 1.3 (FiguresS23 and S24).We wonder whether this assembly might represent a stable oligomeric structure.F I G U R E 7 AF2-ColabFold models of amyloid-forming AMPs without experimental structures.(A,B) The first ranked model of the 20-mer citropin 1.3.(C,D) The second ranked model of the eicosamer (20-mer) bombinin H4.Yellow circles mark clashes between subunits in panel (C).(E,F) The third ranked model of the 25-mer bombinin H4, with eight sparse subunits removed for clarity.Panels (A,C,E) are colored by sheets, showing inter-sheet and interhelical distances.In gold are subunits that do not participate in the continuous sheets.Panels (B,D,F) are in a surface representation colored by hydrophobicity, as indicated by the scale bar.
While discussing the results, it is important to take into consideration the estimated prediction accuracy.AF2 provides a per-residue accuracy measure called pIDDT, which represents confidence level on a 0-100 scale, 100 being perfect.AlphaFold assigned pIDDT values in the range of 90 for the monomers, indicative of very high confidence.Notwithstanding, however, values dropped considerably (range of about 20-40) for multimers, indicating much lower confidence.AF2 also provides an estimate of the predicted deviation between each residue of the model from the correct structure after superimposition of one pair of corresponding residues.This estimate, called predicted alignment error (PAE), is reported in Å RMSD.Overall, very low PAE values (near 0) were assigned for predicted superimposition within each of the monomers (intra-monomer) both alone and within the context of the multimers, indicating that the predicted structure of the monomers would superimpose nearly perfectly on the real structure.However, PAE values between monomers (inter-monomer) were rather high, indicating very low confidence in the relative orientations of one monomer with respect to others.Overall, AF2 appears highly confident in the individual monomeric structure but is much less certain about how they pack against each other.AF2 provides another metric known as sequence coverage, which represents the number of sequences per position during the multiple sequence alignment phase.

following
AlphaFold statistics: (a) pLDDT, which estimates the confidence of the prediction; (b) sequence coverage plot, indicating the number of sequences per position during the multiple sequence alignment step; and (c) PAE, representing the expected positional error when aligning the true and predicted structures for each residue.These evaluation metrics are presented for each AlphaFold prediction we conducted in Figures S1-S48.