Target highlights in CASP13: Experimental target structures through the eyes of their authors

Abstract The functional and biological significance of selected CASP13 targets are described by the authors of the structures. The structural biologists discuss the most interesting structural features of the target proteins and assess whether these features were correctly reproduced in the predictions submitted to the CASP13 experiment.

The results of the comprehensive numerical evaluation of CASP13 models are available at the Prediction Center website (http://www. predictioncenter.org). The detailed assessment of the models by the assessors is provided elsewhere in this issue. involving an E1 Ub-activating enzyme, E2 Ub-conjugating enzyme, and an E3 Ub-protein ligase. The diverse distribution of E3s, for which >600 are known to exist in the human genome, affords the ubiquitome pervasive substrate specificity. As such, Ub-modified targets are ultimately destined for a myriad of molecular outcomes depending on the Ub chain length and linkage type. 1 Many proteins of the E3 superfamily incorporate the highly abundant WD40-repeat (WDR) substrate-recruiting domain as a functional protein-nucleic acid and protein-protein interaction (PPI) module. As a vital component to many multiprotein complexes, the WDR domain is unsurprisingly central to a range of cellular processes, including checkpoint signaling, protein trafficking and degradation, DNA replication, and DNA damage repair (DDR). RFWD3 (RING finger and WD repeat domaincontaining protein 3) is a WDR-containing E3 originally identified as an ATM/ATR substrate involved in DDR. 2,3 Evidence has shown that the WDR domain is primarily responsible for the functional interactions that allow RFWD3 to maintain genomic stability. 4,5 Furthermore, heritable mutations, particularly an Ile639Lys point mutation within the WDR domain, may lead to the rare genomic instability disease known as Fanconi anemia (FA), 6 thereby implicating RFWD3 as a potential FA-associated gene (alias: FANCW). [4][5][6] Until recently, the biochemical characterization of RFWD3 has lacked complementary highresolution structural information. Here, we discuss the 1.8 Å resolution X-ray crystal structure of the C-terminal WDR domain of human RFWD3 (Table 1, Target: T0954; PDB: 6CVZ). WDR domains exhibit a β-propeller architecture typically composed of seven WD repeats (propeller blades). Each repeat is a fourstranded antiparallel β-sheet of 40 to 60 residues in length. Other features, though not conserved, may exist within the repeats, such as a DH(S/T)W hydrogen-bonded tetrad, or GH and WD dipeptides. Due to low sequence homology, predicting the presence of WDR domains with sequence analysis alone is difficult and results in an underrepresentation within the proteome. 7 Some of the common protein sequence analysis databases predict RFWD3 to contain three distinct WDRs, while the more specialized WDSP database 7 suggests the presence of six 8 ; however, our structure reveals seven. Stabilizing the fold are hydrophobic interactions between adjacent repeats, along Note: Columns indicate target ID, PDB ID, experimental method, resolution, stoichiometry, size, and CASP13 assessment results. For each target, the accuracy of the best model 1 is provided both at the level of individual protomers (best GDT-TS and corresponding IDDT score) and full assembly (best QS-score).
with a velcro closure between the first and last repeats. Additionally, there is an exposed disulfide bond on the top surface linking repeats five and six (Cys638 to Cys696, Figure 1A). It is yet to be determined whether this is important for conformational stability or a potential site of redox-regulated activity. Multiple sequence alignment suggests these cysteine residues are not well conserved. Additionally, only a single WD dipeptide is present (WD 730 ), located at the C-terminus of the third strand in repeat six. It should also be noted that a coordinated magnesium ion in the structural model is an artifact of the crystallization condition with no biological relevance ( Figure 1A  How was CASP13 able to predict and model the structure of this important target? Overall, the modeling efforts were successful with 56 predictions (out of 86 total) providing GDT-TS scores >50. Group A7D was the most accurate structural predictor with a GDT-TS score of 72.0 and an RMSD of 1.87 Å across 2066 atom pairs. Superposition with the crystal structure reveals the overall topology ( Figure 1A), including the large disordered loops ( Figure 1B), was well reproduced domain with an LGA score of 58.7 ( Figure 2B). 19 The top scoring model (T0970TS112_1-D1, GDT-TS = 67.94) has a similar LGA score of 56.1 ( Figure 2C), and faithfully recapitulates some important aspects of the true overall fold, although missing some key features such as an extended C-terminal β-hairpin (star) and a short β-strand pair (triangle) that is conserved between GFRP and KCTD8. However, the extended C-terminal beta hairpin was correctly predicted in the sixth best overall scoring model (T0970TS043_1-D1, GDT-TS = 63.23), and the short beta strand pair correctly predicted in the second best scoring model (T0970TS149_1-D1, GDT-TS = 66.18).
Overall, several interesting points arise from analysis of the predicted structures. First, the fact that the GFRP template itself has a slightly higher LGA score than the best solutions raises the question of whether the excellent fit of this template could have been predicted and hence incorporated to generate improved constraints on solutions.
Second, although standard high-throughput multi-construct design techniques were used to generate the construct that produced the crystal structure of KCTD8, the essentially correct prediction of the H1 domain in this case suggests that current structure prediction techniques could potentially assist in the process of designing expression constructs.  Figure 2A) and the crystal structure of human KCTD8 H1 domain (PDB: 6G57, cyan). C, Structure superposition of the crystal structure of human KCTD8 H1 domain (depicted as in Figure 2B) and the top scoring model from CASP13 (T0970TS112_1-D1, magenta). Two key differences between model and reference are highlighted: a star highlights the region of the C-terminal beta hairpin structure and a triangle highlights the short beta strand pair essential for reproduction, genetic diversity, and evolution, with errors in meiosis leading to human infertility, miscarriage and germ cell cancers. 20 At the center of this process is the establishment of homologous chromosomes pairs through their physical tethering by the synaptonemal complex, and the resultant formation of genetic crossovers. 21,22 To achieve this requires an ornate choreography in which meiotic chromosomes are rapidly moved around the nucleus to enable the identification of establishment of homologous pairs through meiotic recombination. 23 These meiotic prophase movements are driven by microtubule forces that are transmitted across the nuclear envelope via the LINC complex, and directed to the telomeric ends of meiotic chromosomes. 24 In mammals, the meiotic telomere complex (formed by MAJIN, TERB1, and TERB2) physically tethers meiotic chromosome telomere ends both to the inner nuclear membrane and the LINC complex, 25 thereby permitting microtubule-driven chromosome end movements within the plane of the nuclear envelope. The molecular architecture of the meiotic telomere complex is defined by a core MAJIN-TERB2 complex that connects its key functionalities.

|
MAJIN mediates inner nuclear membrane attachment through a transmembrane helix, while TERB2 binds to TERB1, which interacts with shelterin component TRF1 to recruit telomeric DNA, and is also thought to bind to the LINC complex. [25][26][27] Previous genetic studies in mice demonstrated that individual disruption of MAJIN, TERB1, or TERB2 leads to impaired telomere attachment, failure of chromosome movements and infertility. [25][26][27][28] We thus initiated structural studies to understand the molecular basis of this essential process of mammalian meiosis.
The crystal structure of the MAJIN-TERB2 core complex (Table 1, Target: H0980; PDB ID: 6GNX) revealed a 2:2 heterotetramer in which two TERB2 chains wrap around a globular MAJIN dimer ( Figure 3A). 29 Each MAJIN protomer adopts a β-grasp fold, in which a β-sheet grasps around a core α-helix ( Figure 3A). The structural architecture of the β-grasp fold consists of a five-stranded β(2)α-β(3) assemblage with a two-stranded β-sheet insertion, which is seemingly unique to MAJIN-TERB2. The MAJIN dimerization interface is stabilized through aromatic and proline interactions ( Figure  onstrates that the topology of the fold was predicted with an impressive level of accuracy ( Figure 3C). The core α-helix was predicted with local C α RMSD of~1 Å and interacts with the grasping β-sheet through largely native contacts; the β-strands are similarly predicted correctly, although the angulation between α-helix and β-sheet deviates slightly from then native structure ( Figure 3C). The main divergent regions of the model are the MAJIN N-termini and two-stranded β-sheet insertion. MAJIN N-termini lack secondary structure and form surface hydrophobic contacts with the remainder of the structure ( Figure 3C).
While the β-sheet insertion correctly links between strands of the grasping β-sheet, its conformation and orientation differ from the crystal structure, although it is possible that the conformation of this region is stabilized by crystal lattice ( Figure 3C). Importantly, the model shows similar electrostatic properties along the MAJIN DNA binding surface ( Figure 3D).
In the category of oligomeric modeling, there were 73 predictions, of which none correctly modeled the MAJIN-TERB2 2:2 complex. A number of models accurately predicted the core of the MAJIN β-grasp fold but failed to predict the MAJIN dimer interface, which involves amino acids Pro64, Phe73, Tyr75 ( Figure 3E, left). In some cases, the overall MAJIN dimers show superficial similarity with the crystal structure, but with incorrect β-grasp topologies placing residues Pro64, Phe73, Tyr75 far from the interface (eg, TS068_5; Figure 3E, right). In other cases, the interface shows no resemblance to the crystal structure (eg, TS135_1; Figure 3E, mid). Modeling of TERB2 was consistently aberrant as it was typically predicted to adopt a small globular fold that binds to MAJIN, in stark contrast to its extended conformation wrapping around a MAJIN protomer, through a series of surface hydrophobic and β-sheet interactions in the crystal structures.
Components of a constitutive complex, such as MAJIN-TERB2, likely undergo a coordinated folding process in vivo that results in their codependence for stability. This likely highlights an important challenge in modeling, that in such cases it is inappropriate to predict oligomers through modeling of prefolded protomers, and instead requires cofolding of multiple chains in silico.
2.4 | Crystal structure of LP1413, an unusual singlestranded DNA binding protein (CASP: T0958, PDB: 6BTC). Provided by Ignacio Mir-Sanchis and Phoebe A. Rice We named this protein LP1413 as it is a little protein (96 amino acids) annotated as containing DUF (domain of unknown function) 1413. 30 We were interested in its structure and function as part of our ongoing project to understand the SCC family of mobile genomic islands, many of which carry methicillin resistance. Insertion of these elements into the Staphylococcus aureus chromosome creates MRSA (methicillin-resistant S. aureus) strains. We have defined the set of core conserved genes carried by these highly mosaic mobile elements, and are working to determine their functions. 31 LP1413 is encoded in the same operon as a helicase, Cch, that has sequence homology to replication initiator proteins from a different family of mobile elements, the SaPIs, and that has structural homology to MCM helicases. 31 We detected no enzymatic activities in purified LP1413 but found that it binds single-stranded DNA with high affinity. The protein was monomeric in solution 30 (Table 1, Target: T0958).
We thought that LP1413 might be an interesting CASP target because our crystal structure shows it to be a winged helix-turn-helix domain ( Figure 4A), but it was not annotated as such in sequence databases. Also, the structure has two unusual features: a β-bulge in strand 2, and an unusually long turn between helix 3 and strand 2 hosting conserved prolines which helps create a small hydrophobic pocket ( Figure 4B). In the crystal, M1 of an adjacent monomer is inserted into this pocket. However, we found that an M1G change caused almost no change in affinity or cooperativity in binding singlestranded DNA, so the natural ligand for this pocket remains unknown.
Among the highest ranked models according to GDT-TS, all predicted the correct overall fold except one (T0958TS124_1-D1, ranked third with GDT-TS = 74.03) where the order of beta strands 2 and 3 was reversed. Overall, the models diverged most in the placement of the shortest helix, helix 2, and the turn between helices 2 and 3. Except for the poorly ordered N-and C-termini, which were not included in the prediction contest, the backbone atoms of that turn had the highest backbone B-factors in the model, and was the one region where the two copies in the asymmetric unit diverged slightly.
This suggests, not surprisingly, that flexibility correlates qualitatively with difficulty in prediction.
Only 6 of the top 40 models correctly predicted the β-bulge at Val68. In terms of overall GDT-TS scores, they were near both the top and the bottom: rank 1, 2, 5, 28, 30, and 39. The top two scoring models also contained the best predictions for the conformation of the helix 3--strand 2 turn. These two models, both from the Laufer  32 This phenomenon was first discovered in E. coli isolate EC93 and was termed "contactdependent growth inhibition" or CDI. 33 CDI is mediated by the CdiB and CdiA two-partner secretion proteins, which form a complex on the cell surface. 33 CdiA is a filamentous protein that extends several hundred angstroms to interact with receptors on the surface of susceptible target bacteria. Upon binding its receptor, CdiA undergoes a series of conformational changes that ultimately deliver its C-terminal toxin domain (CdiA-CT) into the target cell. 34 To protect themselves from self-intoxication, CDI + cells also produce a CdiI immunity protein that binds the CdiA-CT to neutralize toxin activity. Though characterized most extensively in E. coli, CDI systems are broadly distributed throughout Gram-negative bacteria including pathogens 35 and have been implicated in cooperative behaviors, such as biofilm formation, persistence, and virulence. [36][37][38] CdiA effectors carry extraordinarily diverse CdiA-CT regions, indicating that the systems deploy many distinct toxins. Similarly, CdiI sequences are also highly variable, and each immunity protein only provides protection against its cognate toxin.
Thus, CDI toxin-immunity protein polymorphism underlies an important mechanism of self/nonself discrimination in bacteria. C10-CoA ligand bound into a long narrow tunnel that runs deep into the protein beneath the bound FAD molecule, similarly to that described for other ACAD structures. [46][47][48][49][50][51][52][53] Interestingly, only one molecule of C10-CoA per dimer could be confidently placed into the electron density maps, which corresponded to the conformation of Trp428 beneath the re-face of the FAD molecule. In our structures, Trp428 appears to gate accessibility for the acyl moiety ( Figure 6C).
Analysis of Bd2924 did not reveal any conventional cdG binding sites in the structure, nor was a cdG complex obtainable. Further biophysical binding experiments were conducted, which suggested weak nonspecific binding between cdG and Bd2924 (data not shown).
The tetrameric assembly of divergent ACADs appears to block the proposed docking site of electron transferring flavoprotein (ETF). 54,55 Interestingly, we were unable to detect any dehydrogenation activity; likewise no significant dehydrogenase activity has been reported for other divergent ACADs. 49,[51][52][53][54][55][56] Moreover, the structure of Bd2924 reveals that the chemistry of the conserved active site residue Glu429 may be altered by participating in a hydrogen bond with Asn171. Typically a hydrophobic residue such as phenylalanine is found at the position of Asn171 in "conventional" catalytically active ACADs.
Bd2924 was included in CASP13 as target T0961 and also selected for CAPRI experiments. In general, models predicted the overall fold of Bd2924 to a high standard and included most of the main features identified from our crystal structure ( Figure 6B). This may not be entirely surprising, considering the highly conserved fold within the ACAD superfamily and the presence of highly similar homologs (AidB and ACDH-11) in the PDB that could act as templates. Notably, the models were able to correctly predict the structural features of divergent ACADs and the tetrameric assembly. The models also predicted the large groove in the C-terminal domain, which was a noteworthy feature of Bd2924s structure. A loop region (residues 191-202) contained the least similarity to our experimental structure and had the most variability in the top models. This is likely due to the loop being a relative insertion in the primary sequence of Bd2924 in comparison to templates AidB and ACDH-11. Specifically, models deviated around the ligand binding and active site regions ( Figure 6D). For instance, the modeled conformation of Trp428 obstructs the depth of the substrate-binding tunnel, which would lead to incorrect predictions about the length of fatty acyl chains that can be accommodated. Furthermore, the strictly conserved catalytic glutamate residue, Glu429, is also modeled in various conformations. Our crystal structure highlighted a potentially important hydrogen bond between Glu429 and Asn171, which may explain the lack of observable catalytic activity. However, none of the top models successfully model the same hydrogen-bonding interaction captured in the crystal structure. A major drawback to the models is the lack of precision regarding Bd2924 interaction with its cofactor FAD, which is an integral part of the structure of ACADs. 53 2.7 | The receptor-binding tip (gp37 3 -gp38) from the Salmonella phage S16 long tail fiber (H0953, PDB ID: for phages. To this end, phage S16 is special as it can infect many Salmonella strains, suggesting that either S16 recognizes a wide assortment of cell surface substrates, or S16 targets a highly conserved receptor of Salmonella. S16 is a relative of the well-studied phage T4. Both phages are equipped with baseplate-attached long tail fibers (LTFs) that mediate receptor-binding through their distal tips. [64][65][66][67][68] We recently exploited the Salmonella-specific binding of the S16 LTF as a tool for rapid, ELISA-like detection of Salmonella contaminants in food. 69 The T4 and S16 LTFs are similar to each except for the structure of their distal tip that interacts with the host cell surface during host recognition. The distal tip of the T4 LTF is formed by the C-terminal domain of gene product 37 (gp37), whereas the S16 LTF carries an additional protein--gp38 that caps gp37. Gene 38 is present in the T4 genome, but the amino acid sequence of T4 gp38 is very different from that of S16 gp38 and its function is to assist folding and assembly of the T4 LTF. Ironically, the prototypical and better studied T4 phage in which gp38 does not participate in host recognition is a less common representative of T-even phages most of which appear to carry S16-like LTFs. Unsurprisingly, the structure and function of gp38 and its homologs (commonly named "adhesins" 70 ) have been of great interest ever since they were discovered as the determinants of phage host range. 71 Thus, by solving the crystal structure of a distal part of gp37 connected to gp38 from phage S16 we aimed to advance our understanding of this important family of adhesins ( Figure 7B). 68 Gp37 is a homotrimeric β-helix. . Access to the binding tunnel appears to be gated by the conformation of W428 (orange), which adopts two conformations; one that allows access (C10-CoA present) and one that blocks access (C10-CoA absent). D, Comparison of the active site between Bd2924 crystal structure and the three best models (same coloring scheme as in Figure 6B The highest ranked model for the whole gp37-gp38 multimer was TS086_1 by the BAKER group ( Figure 7F) with a QS-score of 0.37.
Despite failing to predict gp38 and its attachment to gp37 correctly, this model very accurately determined the composition of the gp37 β-helix, including the N-terminal triangular and C-terminal interdigitated domains. As a single target, gp37 (T0953s1-D1) was similarly predicted well, with the top two models by groups A7D and BAKER (GDT-TS of 54.48 and 48.88, respectively) shown in Figure 7D. Visual The adhesin tip of the Salmonella phage S16 long tail fiber. A, Transmission electron micrograph of phage S16 with arrows (1) pointing to the approximate location of gp38 at the tip of the LTF and (2) pointing to the baseplate. B, Cartoon representation of the LTF distal tip complex of homotrimeric gp37 β-helix (cyan, magenta, pink) attached to a single gp38 adhesin (gray) with the structurally unique "polyglycine sandwich" domain rainbow colored (blue to red). Gp38 connects to gp37 through hydrophobic interactions, involving three highly conserved tryptophan residues on the apex of each α-helix of the gp38 attachment domain (yellow sticks) that occupy three symmetry-equivalent hydrophobic pockets on the gp37 base. inspection showed correct prediction of the triangular domain; however, only the BAKER group correctly determined the continuation of the gp37 chain to form the interdigitated domain. Interestingly, the next eight best predictions also correctly modeled the triangular domain, but similar to A7D, incorrectly assumed that the C-terminal part folds back on itself. Possibly, the trimeric nature of the protein was not taken into account for many of these predictions.
To assist the predictions, SAXS and SANS envelopes as well as protein cross-linking data of the complex were provided to the competition (S/A/X0953). The molecular envelopes generated by SAXS and SANS reproduced the shape of the gp37-gp38 crystal structure very well; however, it is unclear whether and how these data were used by the CASP participants as the composition of predicted models did not made biological sense or present folds similar to the crystal structure. In fact, models obtained during the regular prediction round without using these envelopes better represented the crystal structure. Due to a lack of potential cross-linking reactive residues (Lys, Asp, and Glu) within the gp37-gp38 interface, protein cross-linking was unfortunately not of assistance with oligomeric predictions.

Provided by Harshul Arora Veraszto and Marcus D. Hartmann
The AROM complex is a homodimeric pentafunctional fusion enzyme in the shikimate pathway in fungi and protists. 73 This pathway is a seven-step biosynthetic route to chorismate, the central precursor for aromatic amino acids and other aromatic compounds, 74 (Table 1, Target: T0999). As we did not expect a fast breakthrough in crystallization experiments, we started to equip ourselves with experimental restraints for an in silico structure modeling.
To this aim, we collected small angle X-ray scattering (SAXS) data and, in collaboration with Alexander Leitner from ETH Zürich, cross-linking mass-spectrometry (XL-MS) data, which we aimed to combine for a rigid-body modeling and refinement approach based on the known structures of the individual enzymatic domains (see also 81 ). However, Although not used for initial structure determination, the SAXS and XL-MS data turned out to be very insightful in investigating the conformational landscape of the AROM complex (to be published), which motivated us to provide both our SAXS and XL-MS data to CASP participants. Obviously, the prediction of the individual domains was a trivial task, and was mastered very well by most of the participating groups (average best GDT-TS over D2-D5 domains = 80.39). When it came to the prediction of the whole assembly, however, the situation was quite  Contractile injection systems (CISs) such as contractile bacteriophage tails, the Type VI secretion system (T6SS), R-pyocins, and tailocins are multiprotein injection devices sharing a seringe-like architecture. [82][83][84] They are assembled of three major building blocks: a long rigid inner tube sharpened by a needle-like tip, a contractile helical sheath surrounding the tube, and a baseplate that anchors the system to the target cell membrane, rearranges and triggers sheath contraction to expel the tube out of the sheath and puncture the target membrane.
The two targets provided to CASP were derived from the highresolution cryo-electron microscopy (cryo-EM) structures of the antifeeding prophage AFP from the soil bacterium S. entomophila whose pathogenicity to the New Zealand pasture pest Costelytra giveni, is largely due to AFP which injects its insecticidal toxin into the C. giveni larvae. 85 The targets represent the two opposite extremities of the AFP tailocin in its metastable extended state: the cap (  Target H1022).
The apical cap of the AFP particle in extended state. A,B, Experimental structure of the H1021 target and the five best CASP prediction models according to QS score. This hexameric complex is composed of one layer of the sheath protein Afp2 (blue) surrounding the tube protein Afp1 (dark green) and capped by Afp16 (shades of magenta). The N-and C-terminal arms of Afp2 are shown in light green and red, respectively. The α-helix of Afp2 interacting with the tube is shown in orange. Accordingly, the predicted models of the Afp1-Afp2-Afp16 complex show a good accuracy for the Afp1-Afp2 sub-complex ( Figure 9A,B).
The conserved interface formed between the sheath and the tube (Afp2 α-helix-Afp1 ß-sheets) was correctly modeled by the top predictions sorted by QS score ( Figure 9C). The conserved bilobe fold of Afp2, described for other CIS sheath proteins, was accurately modeled in the monomer predictions ( Figure 9E). In the complex, intra-Afp2 interactions via the extended N-and C-terminal arms are present in three out of five top models, although the monomer predictions for Afp2 showed a C-terminal extension folded back onto the upper lobe of Afp2 ( Figure 9E).
The tube proteins Afp1 has a similar fold as the T4 (gp19, 67 ) the T6SS Hcp, 88  Zhang. We also noticed that the overall fold and locations of the two proposed catalytic residues of AxlA is successfully predicted by most top ranked models. However, not a single predicted model is able to reproduce the experimentally determined loop-helix-loop motif structure (residues 398-425), which appears to be structurally essential for AxlA to form +1 site for the backbone glucose as well as recognizing galactose and xylose sugars on the adjacent branch of the xyloglucan fragment being acted on ( Figure 11B,C). This is possibly due to the low sequence and structural conservation in this region in structurally characterized  Figure 11C).
A close examination on the quaternary assembly of AxlA using EPPIC 96 identified it as a possible homo-tetramer formed from two subunits in the asymmetric unit and two adjacent subunits related by crystallographic symmetry. The above-mentioned loop region with poor structural predictions is also involved in tetrameric assembly ( Figure 11D). T0969). The structure is composed of three β-sheets consisting of seven (β3-β6, β9, β12-β13), four (β7-β8, β10, β11), and two (β1-β2) β-strands, respectively, nine α-helices including a "broken" one and three α-helical turns ( Figure 12A). The molecule is divided into two lobes with a deep cleft in between: the first larger lobe contains almost all secondary structure elements found in the protein while the second, of smaller size, is mostly unstructured and contains four disulfide bridges (Cys140-Cys191, Cys162-Cys227, Cys171-Cys467, Cys384-Cys463). The deep cleft between the two lobes contains the substrate binding site, that is, a catalytic triad Ser216-His465-Asp462 similar to that found in serine proteases 98 ( Figure 12B). The walls of As assessing the functional relevance of models is difficult for CASP assessors to address on a large scale, we hope that this study will inspire future CASP assessors in emphasizing the relevant aspects of models that inform our understanding of protein function.