Target highlights in CASP14: Analysis of models by structure providers

Abstract The biological and functional significance of selected Critical Assessment of Techniques for Protein Structure Prediction 14 (CASP14) targets are described by the authors of the structures. The authors highlight the most relevant features of the target proteins and discuss how well these features were reproduced in the respective submitted predictions. The overall ability to predict three‐dimensional structures of proteins has improved remarkably in CASP14, and many difficult targets were modeled with impressive accuracy. For the first time in the history of CASP, the experimentalists not only highlighted that computational models can accurately reproduce the most critical structural features observed in their targets, but also envisaged that models could serve as a guidance for further studies of biologically‐relevant properties of proteins.

authors on the accuracy of best models submitted on 12 CASP14 targets ( Table 1). All target providers were invited to contribute to the paper, with the exception of five targets structures for which have been solved by using CASP models, described separately in this issue. 7 The resulting targets presented here include: the neutralizing monoclonal antibody 93k bound to the varicella-zoster virus fusogen glycoprotein B (H1036 and T1036), the Bacteriophage T5 tail tip complex (H1060 and T1061), polymorphic CDI toxinimmunity protein complex from Serratia marcescens (H1065, Members of the Herpesviridae are pathogens of humans and animals that cause a wide range of medically and economically important diseases. 8 The outer lipid membrane of herpesvirus virions is studded with glycoproteins that enable binding to cell membranes and fusion of the virus envelope to initiate entry and establish infection. Herpesvirus orthologs of glycoprotein B (gB) are trimeric proteins that have been classified as type III fusogens due to their structural similarities with vesicular stomatitis virus G protein and baculovirus gp64. [9][10][11][12][13][14][15] The ectodomain architecture for gB orthologs consists of five structurally distinct domains (DI to V) that fold into a homotrimer with C3 symmetry.
Varicella-zoster virus (VZV) is an alphaherpesvirus that causes chickenpox (varicella) upon primary infection. 16 VZV establishes latency in sensory ganglion neurons and subsequent reactivation manifests as shingles (zoster). In addition to virion entry fusion, characteristic polykaryocyte formation caused by cell-cell fusion within tissues in vivo is essential for VZV pathogenesis. This process can be modeled in vitro via syncytia formation of VZV infected cells in culture. 17,18 Critically, there are adverse health effects directly linked to cell fusion between differentiated host cells; fusion between ganglion neurons and satellites has been associated with postherpetic neuralgia, and strokes have been linked to vascular endothelial cell fusion. [19][20][21] The functional domains of herpesvirus gB orthologs have been characterized using monoclonal antibodies (mAbs) that neutralize viral infection via binding to gB before membrane fusion. 11,[22][23][24][25][26][27][28][29][30][31] Although the molecular interactions for some of these antibodies with gB residues have been defined previously, it was unknown whether these gB residues were involved in fusion function or virus infection. 11,28 A newly derived human mAb, 93k, neutralized VZV by binding to gB and membrane fusion inhibition. 32 To elucidate gB domain function and their role in VZV infection, a 2.8 Å resolution cryo-EM structure of native, full-length VZV gB in complex with mAb 93k Fab fragments was determined. 32 This near-atomic resolution structure revealed residues within gB DIV that were then shown to be essential for membrane fusion by evaluating DIV mutants in a virus free assay. with gB residues R592 and I594 of β23, and V617 and L619 of β25 ( Figure 1C; see supplemental movie 3 in Oliver et al. 32 ). The aromatic ring of VHCDR3 Y113 formed a cation-π interaction with gB R592 that was inserted into a negatively charged pocket within the 93k antigen binding site. In addition, the OH group of VHCDR3 Y113 and the side chain of N111 the carbonyl oxygen formed H-bonds with and backbone nitrogen of gB I593 and L595, respectively ( Figure 1C).
At the boundary of gB β23 and 93k interface the carbonyl oxygens of VHCDR3 P103 and G104 H-bonded with the side chain of the gB Q596 and the backbone nitrogen of N597, respectively, while the backbone nitrogen of VHCDR3 A106 H-bonded with the gB L595 carbonyl oxygen ( Figure 1C). The gB-93k interface made a sharp turn where hydrophobic and Van der Waals contacts dominated the 93k interaction with gB β28-30. The H-bond between VHCDR3 T108 OG1 and gB E670 OE1 was surrounded by hydrophobic interactions between residues P107, P109, and L110 of VHCDR3, and W32 of the variable light chain CDR1 (VLCDR1) and gB β28-30 residues F655, H658, V660 and Y667 ( Figure 1C). This complex network of hydrophobic and hydrophilic interactions at the gB-93k interface of postfusion gB identified the strongest interactions between gB β23 and β30, and the 93k VHCDR3. Importantly, because mAb 93k has neutralizing activity through fusion inhibition, 32 residues within gB DIV β23 and β30 were implicated in a functional role for membrane fusion.
Indeed, two or more alanine substitution of residues within β23 and β30 reduced or abolished fusion and limited the capacity of VZV to infect cells, indicating that these residues act together to ensure that the gB structure supports its fusion function. Using cryo-electron microscopy, we determined the structure of T5 tail tip, before and after interaction with its receptor FhuA 36 : we could solve the structure of two rings of the Tail Tube Protein pb6, prolonged by a ring of p140 surrounded by a dodecamer of p132 that forms the collar, a hexameric ring of pb9, a trimeric ring of pb3, which closes the tube, and a trimer of the C-terminus of the Tape Measure Protein, pb2 ( Figure 2). Although the structures of pb9 and pb6 were already available, 37,38 structures of p140, p132, pb3, and pb2 were unknown. 35 The structure of the whole tail tip before interaction with the receptor has been submitted to CASP14. The pb6-p140-p132-pb9 complex has been proposed to the competition, as well as the individual rings and individual proteins.
Although not having any sequence homology with pb6, p140 shares the same fold 36 and both form a trimeric ring. This was well predicted, with the best GDT-TS = 83 for the monomer, and a QSscore of 0.442 for the trimeric ring. The inner-ring diameter was correctly reproduced in the best quality model only, while it was predicted to be smaller in all other models ( Figure 2B). The structure of p132 monomer, which belongs to the immunoglobulin superfamily, was very well predicted (best GDT-TS = 95). The dodecameric ring was also well predicted by five groups (QS-scores from 0.442 to 0.228). The predicted models contained more or less altered subunit interfaces, resulting in slightly smaller rings and/or modified subunit orientation within the ring ( Figure 2C). For both p140 and p132, AlphaFold2 is far ahead of the others (by 18 and 24 points on the GDT-TS parameter). Pb6 and pb9 rings were also well predicted, with  Figure 2D). At least the six top groups predicted the correct inner diameter of the tube, even though the orientation of the protein within the ring is not always optimal, due to modified subunit interactions.
An important protein of this assembly is pb3, which closes the tube. This protein is predicted to share structural similarity with the baseplate hub proteins of Myoviridae and related contractile injection bacterial systems. 35 It is, however, a larger protein, with in addition two fibronectin domains in C-terminus predicted from the sequence. 35 Indeed, the protein is composed of the four canonical "hub domains" (HDs) of phage T4-pg27, 39 with a large insertion in the second one to allow the closure of the tube, and two C-terminal fibronectin domains ( Figure 2E1). Only three groups predicted the struc-  Figure 2E1). Very interestingly, these predicted structures do not represent pb3 in its closed conformation, in which part of the insertion in HDII is folded back along the inner wall of the tube to provide a plug to close the tube (orange in Figure 2E2). This plug sequence (45 residues) is rather stretched out downwards as a long beta hairpin in the predicted structures (cyan in Figure 2E2). This is very close to the structure of pb3 after interaction of the tail with its receptor, which induces the opening of the tube ( Figure 2E3), which thus seems to represent a more stable conformation of the protein (unpublished results). When the pb3 trimer is considered, only one group predicted it with satisfaction and here again in the open conformation ( Figure 2E4; QS-score with the closed pb3 trimer was 0.252). Others, even with similar QS scores, did not predict the correct monomer structure. The trimeric pb2-C-terminal helical bundle was very well predicted by six groups, with QS-scores ranging from 0.678 to 0.607 ( Figure 2F).
With regards to the pb6-p140-p132-pb9 complex, four groups predicted reasonably the general tube assembly (QS-score of 0.266-0.196), with the correct inner-tube diameter and inter-ring distances.
Inter-ring interactions were however not optimal, as none predicted the correct register of the different rings ( Figure 3).
In conclusion, each target (whether it was monomers, rings, or full complex) was reasonably well predicted by at least one CASP14 competitor, and very often by several ones. The best structure predictions for p132, p140, and pb3 monomers were highly accurate, as well as for the pb2 trimer. In the case of ring assemblies, although some predictions were reasonably close to the targets, it was surprising to observe noticeable variations regarding ring diameter/orientation, and structure predictions of the monomers within the rings were not as F I G U R E 3 Four best CASP predictions of phage T5 tail tip complex (H1060v0, pb6-pb6-p140-p132-pb9) aligned on the experimental structure, in which the different proteins are colored as in Figure 2.  We recently determined the high-resolution crystal structure of a novel CDI toxin-immunity protein complex from the nosocomial pathogen S. marcescens BWH57 ( Figure 4). The CdiA-CT BWH57 region is The CdiA-CT BWH57 nuclease domain includes three α-helices and one 3 10 helix, four antiparallel β-strands arranged in a small concave β-sheet and two β-strands that form a hairpin. The β-sheet and β-hairpin wrap around α4, which serves as a core of this fold. Helix α3 has a significant kink and helix α1 interacts with the β-hairpin.
CdiI BWH57 has a simple α/β fold with two α-helices, three 3 10 helices and four mixed β-strands arranged in a small β-sheet. The toxin's interaction surface is largely electropositive and complemented by a negatively charged patch on the immunity protein ( Figure 4B).
CdiI BWH57 binds to the nuclease domain using the large loop linking β1 to β2 and three 3 10 helices ( Figure 4). These secondary structure elements interact with the exposed β-sheet residues, two loop regions, helix α3, and the C-terminus of the toxin domain. Several CdiI BWH57 residues that interact with the toxin, including K5, D9, Y10, W16, D25, and the C-terminal Y98, are highly conserved across the protein family. Similarly, toxin residues H47, E51, H52, R89, N117, and R119 that interact with the immunity protein are also highly conserved. A subset of these latter residues (H47, E51, H52, R89) are good candidates to form the nuclease active site, suggesting that CdiI BWH57 binding to the toxin blocks access to its RNA substrates.
For the CASP14 competition, CdiA-CT BWH57 and CdiI BWH57 were first modeled as individual monomers, and the top 10 predictive models, as ranked by GDT-TS score, were evaluated. Figure    . Previously, we demonstrated that BIL2 operates as a "singleubiquitin-dispensing-platform," allowing the conjugation of ubl4 to different substrates such as ubl5 and Ras GTPase. 66 Since the splicing reaction is ATP-independent, the presence of the intein allows the host to avoid employing energy-consuming cascades of enzymes usually deputed to ubiquitin conjugation.
In order to elucidate the molecular mechanism of BUBL protein splicing, we solved the high-resolution crystal structures of BIL2 in both apo and zinc-bound forms. The analysis of the structures revealed that zinc induces a conformational change of H69, which has been suggested to function as a key catalytic residue, 67  BonA is an outer-membrane lipoprotein from the opportunistic pathogen A. baumannii that is important for maintaining the structure and function of the outer membrane. 73 In A. baumannii, the loss of BonA causes the loss of cell motility and a change in the structure of the outer membrane. 73 BonA homologs in other bacterial species (designated YraP or DolP) form part of the cell envelope stress regulon (e.g., SigmaE regulon in E. coli). 74 These BonA homologs are important for the integrity of the outer membrane and the virulence of bacterial pathogens (e.g., Neisseria gonorrhoeae, Salmonella enterica). [75][76][77] BonA and its homologs localize to the divisome, the large protein complex that mediates cell division in bacteria. 75,78 As part of the divisome, DolP, the BonA homolog from E. coli, regulates the activity of cell wall remodeling enzymes during cell division. 79 The mechanism by which BonA and its homologs mediate their function remains unknown.
BonA is 235 amino acids in length and is composed of two Bacte-  73 However, in the crystal structure, BonA-27N formed a dimer ( Figure 7A), that has an extensive buried surface area of 3236 Å 2 according to PISA. 81 In the BonA-27N structure, the Cterminal BON domain (BON2) adopts the canonical α/β-sandwich fold, consisting of two α-helices and three β-sheets. However, in the N-terminal BON domain (BON1), α-helix 1 is displaced from the α/β-sandwich, by α-helix 1 of BON2 from the opposing dimeric molecule, which forms a hydrophobic interaction that facilitates dimer formation ( Figure 7B). I hypothesized that this dimer was a constituent of the BonA decamer and performed additional structural analysis of full-length BonA using small-angle X-ray scattering and negative stain electron microscopy, revealing that the decamer was pentameric, consisting of five BonA dimers. 73 The sequence corresponding to BonA-27N was submitted as a A major difference between the model and experimental data was the orientation of α-helix 1 of BON1, which rather than being displaced from BON1 as in the experimental structure, adopted a canonical BON domain conformation ( Figure 7C). This position of α-helix 1 of BON1 in the model precludes the formation of the dimer observed in the crystal structure and is analogous to BON1 of DolP, which exists as a monomer when purified. 75 Experimental evidence indicates that BonA is stable as a monomer both when purified and in the bacterial cell. 73 To exist in this state, the hydrophobic surface protected by α-helix 1 of BON2 in the dimer would need to be shielded from the solvent ( Figure 7D). α-helix 1 of BON1 in the CASP14 models adopts analogous conformation to α-helix 1 of BON2 ( Figure 7E),  This is in sharp contrast to JBP1 J-DBD that binds J-DNA with low nM affinity in vitro, and has a remarkable discrimination against normal DNA, which it binds with μΜ affinity. The low sequence identity between the JBP1 and JBP3 J-DBD domains (16.5%) was enough to establish the homology between them, but not sufficient to understand their difference in J-DNA specificity from sequence conservation alone. Importantly, Asp525, the JBP1 residue that we have previously shown to be crucial for discriminating J-DNA against normal DNA, is conserved, as well as Lys522A and Arg532A (but not Lys518 or K524), which are all important for general DNA binding.
We therefore decided to determine the structure of the J-DBD domain of JBP3, to understand what are the structural determinants that confer the limited affinity and specificity toward J-DNA. We were surprised to find out that we were unable to determine the structure of the JBP3 J-DBD by molecular replacement. We determined the structure using massive combination of small fragments and density modification as implemented in Archimboldo-Lite. 91 The main difference between the JBP1 and JBP3 J-DBD domain structures is the placement of the N-terminal region and C-terminal helix (α5) of the helical bouquet fold that we have previously described. The determined structure, covering 133 of the 134 residues of the mature protein, forms a β-roll-like distorted architecture containing two α-helices and nine β-strands ( Figure 10A). The overall β-roll fold part of the structure is formed by two β-sheets, one comprised by β-strands 1-4, which is connected to a second sheet, comprising β-strands 5-9, via a disulphide bond formed between residues C31 and C132, which appears to adopt two alternative conformations. Additionally, a disulphide bond C90-C118 links the 19 residues loop between β-strands 6 and 7 with β-strand 8, suggesting that correct positioning of this loop is relevant for Bd0675 function. All cysteine residues are conserved in predatory homologs.
An electrostatic surface potential shows that Bd0675 possesses a hand-like shape with a potential binding cleft, which is mainly nega-   Figure 11A). Even the extended N-and C-terminal regions with irregular secondary structure were predicted accurately, with more than 96% residues correctly aligned with the experimental structure.
The accuracy in side chain rotamer predictions was also very good with RMS_all of 1.7 calculated on all atoms. Though the di-sulfide bonded cysteines are placed juxtapose to each other in the predicted structure but the di-sulfide linkages have not been predicted. Other top ranked models from FEIG-R1 (GR# 314), FEIG-R2 (GR# 480), FEIG-S (GR# 013s), and Seder2020hard (GR# 428) groups also predicted the protein fold correctly with GDT score more than 80 ( Figure 11A). Tsp1 forms dimer and the dimeric interface was also hydrophobic residues, especially leucine. 106 Based on initial speculations that the hydrophobic residues of the individual helices would interdigitate like the teeth of a zipper, short coiled coils are also often termed leucine zippers, 107 although the eponymous hypothesis shattered when the first crystal structures showed that the hydrophobic residues are not interdigitating at the interface, rather being arranged like the rungs of a ladder. In recent years, however, we have come across a family of coiled-coil proteins that essentially resembles the initially hypothesized zipper architecture, although with a decisive difference. This family is especially rich in histidines, which are found in a repetitive arrangement and it is these histidines that interdigitate like the teeth of a zipper between two antiparallel helices of a monomeric α-helical hairpin. 108 As seasoned coiled-coil researchers, we set out to further characterize and delineate this unexpected new coiled coil flavor.
In sequence searches we identified a wide range of such histidine zippers. All of them appeared to form hairpins of different types, which we confirmed with the determination of several crystal structures. Interestingly, many of them turned out to be homo-oligomers, in which a histidine-zipper interface can be found either within the monomers (intra-chain), between the monomers (inter-chain), or both. We expected these to be possibly challenging targets for structure predic- we can only speculate about the functional role of these proteins, and hypothesize that they might function as scavengers of metal ions.
To our surprise, most groups and servers did a very good job at predicting this new variant of the coiled-coil fold. It is likely that several predictors have benefitted from the structure of the first representative that we had published for this fold previously, from the fungus Serendipita indica (PDB: 5LOS). 105 This instance has 23% sequence identity to Tuna, 15% to Nitro and 19% to Meio. However, it was not identified as a template by the CASP prediction center for either of the three targets, and also sequence searches with The most important feature of all three targets, the correct orientation of the histidines to form the zipping interactions was generally predicted very well in the top predictions, even in those from the best servers. According to the CASP14 evaluation formula, which we describe in a separate article in this special issue, 111  In both lineages, HBc consists of a predominantly α-helical, Nterminal assembly domain ( Figure 13) that forms the capsid, and an unstructured arginine-rich C-terminal domain (CTD) that projects into the capsid interior and fine-tunes the charge balance with the genome.
Only the ordered assembly domain of HBc has been amenable to structure determination, and huHBc has been studied for decades . [115][116][117] The assembly domain of huHBc forms hammer-shaped dimers that assemble into capsids with protruding spikes, 118 and these spikes contact the envelope in viruses and virus-like particles. 119,120 Each monomer contributes two long helices (α3 and α4), connected by a short loop, to the intra-dimer interface of the spikes ( Figure 13A). The inter-dimer contacts are mediated by a hand-like region that follows the helical hairpin in the spike and precedes the CTD. The sequences of inter-dimer contacts are conserved among Hepadnaviridae, which is not the case for the inner dimer contacts, or the protruding part of the spikes.
In contrast to huHBc, DHBc is much larger with an extension domain of approximately 40 residues that maps to the loop region of the spikes. To understand the structural importance of this extension domain, we determined the structure of DHBc in capsids by electron cryo microscopy. 114 As in huHBc, the core of the spike is formed by a four helical bundle with two helices from each monomer ( Figure 13). These helices are longer than in huHBC with a dif-  Figure 13).
In conclusion, many predictions recapitulated key-features of the fold of DHBc but failed to predict changes in the oligomerization interfaces that deviated from huHBc. The X-ray crystal structure of A. pompejana ASCC1 was to 1.4 Å with one molecule in the asymmetric unit ( Figure 14A While previous SAXS studies that directly measure flexibility 148 suggested that, in general, X-ray structures were too rigid, 149 computational predictions were uncovering the greater flexibility of the solution structures. 150,151 Indeed, several repair proteins were shown to be functionally flexible, 129,152 and our X-ray structure revealed a simple loop connecting the two domains, consistent with substantial flexibility between the two domains.

|
Yet, the clear consensus of the highest ranked prediction models on the relative orientation of the two domains suggests to us that the ASCC1 domains are not flexible relative to each other but are rigidly encoded in the sequence. Perhaps, ASCC1 activity is strictly controlled and that this rigidity plays a role in the regulatory mechanism.
So the prediction models and their interesting implications will be tested by SAXS and mutational analyses, which ultimately need to be integrated with testing in and structural imaging in cells that can provide the most relevant environment, 153,154 Furthermore, emerging cancer biology data are showing that it is important to understand the structure of the nucleic acid as well as of the damage response proteins. 155 So, the potential structural rigidity of ASCC1 suggests its activity may favor specific RNA structures or serve to sculpt RNA for

| CONCLUSIONS
This article describes the structural and functional aspects of the selected CASP14 targets. The authors of the structures highlighted the most interesting target features that were reproduced in the models, and also discussed the drawbacks of the predictions. The overall ability to predict three-dimensional structures of proteins has improved remarkably, and many difficult targets were modeled with impressive accuracy.
When modeling monomeric targets, AlphaFold2 systematically outperformed other methods, followed up by runners-up in predicting some targets, and the authors suggested that the top models could be used to confidently infer functional sites of the protein. For example, for target T1057, top two predictions would allow for correct assignment of active site catalytic residues and environment.
There is, however, room for improvement when it comes to modeling loops. It also remains challenging to accurately model multimeric protein complexes. In some cases, the limiting factor could be the lack of the adequate structure of the individual components (e.g., targets H1036 and H1065). In other cases, predictions of the individual components were highly accurate, yet the methods failed to reproduce the relative orientations observed in their oligomeric states.
Examples include incorrect oligomerization interface of the DHBc spike (T1099), and large deviations of the ring assembly for the phage T5 tail tip complex, where no model was able to reproduce inter-ring distances and diameter (H1060 and T1061). We also observed that the conformations of the models for several targets, for example, T1054, T1068, and T1101, differed from the experimentally determined structures. As the authors pointed out, these conformations may represent alternative biologically relevant states, and could be helpful for better understanding of the structural dynamics of the targets.
The outcomes of this critical assessment have paved the way for increasing the synergies between computational and experimental approaches to protein structure determination. As described in another article of this issue, several of the CASP14 targets were solved with the aid of the models, or the models allowed to improve structure accuracy. 7 The synergies could be particularly helpful for capturing conformations that may eluded experimental structure determination, particularly in membrane proteins, 156 or as a strategy for attempting molecular replacement phasing that has already been shown to be beneficial. 157 In conclusion, we have shown that for the targets described here, the most critical structural features were accurately reproduced by the models. The experimentalists now foresee the models guiding further studies of biologically-relevant properties of proteins, including spatial orientations of structural elements and their dynamics. The performance of computational methods has increased, so has the confidence in the scientific value of the results they produce.