Cryo‐EM targets in CASP14

Abstract Structures of seven CASP14 targets were determined using cryo‐electron microscopy (cryo‐EM) technique with resolution between 2.1 and 3.8 Å. We provide an evaluation of the submitted models versus the experimental data (cryo‐EM density maps) and experimental reference structures built into the maps. The accuracy of models is measured in terms of coordinate‐to‐density and coordinate‐to‐coordinate fit. A‐posteriori refinement of the most accurate models in their corresponding cryo‐EM density resulted in structures that are close to the reference structure, including some regions with better fit to the density. Regions that were found to be less “refineable” correlate well with regions of high diversity between the CASP models and low goodness‐of‐fit to density in the reference structure.

determined by cryo-EM, 1 while in CASP14-13% (yielding 22% of evaluation units). Percent-wise, this is more than the share of cryo-EM structures in the whole PDB (currently 4%) although this share is rising, with nearly 20% of structures submitted in 2021 coming from cryo-EM. 2 An adequate representation of cryo-EM structures in CASP is important for several reasons. First, cryo-EM targets differ from other CASP targets in terms of their size and complexity of architecture, and therefore their unproportional share may introduce bias in the evaluation. Second, reference structures from cryo-EM studies often have higher coordinate uncertainty due to lower or nonuniform resolution and as such, may represent multiple conformation in one target. For these reasons, CASP organizers thought it useful to conduct a separate evaluation of cryo-EM targets with the emphasis on model fit to the experimental data per se and not the coordinates derived from these data (reference structure).
Here, we assess the fit of the submitted models to the cryo-EM density maps, and compare the best-fit models (with and without refinement in the density) to the corresponding reference structures provided by the experimentalists. We also compare the performance of CASP14 tertiary structure prediction methods on all targets versus the cryo-EM targets in the traditional CASP way (vs the reference structure) to assure no abnormalities due to specifics of the structure determination approach.

| Participants and predictions
Modeling of cryo-EM targets was a part of the general CASP14 modeling experiment. Models of cryo-EM targets were generated the same way as models of other CASP targets, that is, based solely on sequence. 142 groups submitted 513 models on cryo-EM targets representing multimeric complexes, and 3576 models on their subunits.
As is customary in CASP, for evaluation, targets are split into subunits and domains. With such a procedure, CASP14 cryo-EM entries F I G U R E 1 CASP14 cryo-EM targets. The target code, description, and provider of the reference map and structure are stated next to each target yielded 13 single-domain evaluation units (EUs), four multi-domain EUs, and six multimeric EUs.

| Minimum accuracy of models for evaluation
Evaluation of models versus maps makes sense only if models are of high accuracy enabling sensible fitting in the density. Here we define "high accuracy" models as those scoring in excess of 70 LDDT and GDT_TS for monomers, and 70 LDDT for multimers versus the reference experimental structure. This cutoff was selected as a trade-off between the accuracy of models and the number of targets and models suitable for evaluation. 8 Table 1 provides the number of models satisfying this criterion for all cryo-EM evaluation units.

| Evaluation measures
The models submitted for each cryo-EM target were evaluated for their goodness-of-fit in the experimental cryo-EM density map (model-to-map goodness-of-fit) with nine evaluation measures. The overall goodness-of-fit was quantified using TEMPY's 2.0 9 crosscorrelation coefficient (CCC) and Mutual Information (MI) score 10,11 ; PHENIX's 12 real space correlation coefficients-CCvolume, CCmask, and CCpeaks-each probing different aspects of model-to-map fit 13 ; and the Atom Inclusion score. 14 The local (per-residue) goodness-offit is evaluated with PHENIX's CCbox measure, 13 TEMPy's SMOC score, 10 The cross-correlation coefficients are computed between the experimental map with model-derived maps produced to a specified resolution limit on the same voxel grid, integrated either over the full map or selected masked regions. The TEMPY CCC and PHENIX CCbox [0;1] coefficients quantify real space cross-correlation between the entire target map and the map calculated from the model coordinates.
The two coefficients are highly correlated, but not identical, owing to slightly different approaches in computing the scores. Both The paper also summarizes the results of model evaluation versus cryo-EM reference structures (that is, models generated by the experimentalists using cryo-EM map). This analysis serves the purpose of ensuring that there are no irregularities in ranking participating groups on cryo-EM targets compared to all targets. 29,30 Evaluation measures and principles for ranking participating groups are described in our CASP13 evaluation paper. 1 Four measures are used in this type of analysis: a rigid-body structure superposition measure GDT_TS, 17,18 and three superposition-free measures-LDDT, 18 CADaa, 19

| Model refinement in map
We refined an atomic model in the density map by using a Gaussian Mixture Model to represent the protein structure and refine it in the map. 24 We compute a responsibility map, which is an intensityweighted map for each atom, based on their position and the position of all other atoms. This gives us the new expected (mean) position of every given atom, based on the intensity of each voxel in the original map, and the weight of each voxel in the responsibility map:

| Map segmentation
To accurately gauge whether a model is an accurate reflection of the intensity generated by the target of interest, the maps were masked using a procedure where the fitted model was used to scale the voxel intensities, with voxels further from the model scaled lower, depending on their distance to the target model and other models, using a Gaussian distribution for each atom. This resulted in the intensity of voxels closer to other chains that were not targets to be scaled down.

| Evaluation versus reference structure
To compare the performance of participants on cryo-EM targets, we apply the ranking procedure described in our previous evaluation paper. 1 Figure 2 Figure S1).

| Evaluation of model-to-map fit
Evaluating the goodness-of-fit of CASP models to the experimental cryo-EM density maps makes sense only for targets where highaccuracy models are available. Thus, we ran evaluations only on accurate models (see Methods) of the targets marked blue in Table 1. The main aim of this analysis was to check if CASP models, which were built without the knowledge of density maps, could be further refined into the density so that they can reach the quality and goodness-of-fit of the models provided by experimentalists (reference structure).
To examine this, we applied an automated refinement protocol ( Figure 3) to the high-accuracy models of 12 EUs from four targets, and compared goodness-of-fit to the map of the refined models and the reference ones. We used our in-house real-space refinement implementation in TEMPy (with openMM 26 ) with AMBER14 25 forcefield and five macro-cycles (see Section 2). We then assessed the refined models and compared them to the original models.
We compared all the scores (prior to refinement), in order to understand the relation between them, by computing all-against-all correlation matrices, across all targets. Unsurprisingly, most scores exhibit a high degree of correlation (calculated only on targets with more than 10 high-accuracy models), with the exception of EMringer scores, as seen previously 1 ( Figure S2).
Following refinement, the global improvement of the models relative to their corresponding reference structure is shown in Figure 4.
The overall quality-of-fit to the map has improved significantly

| Local improvement of models compared to reference structure
To quantify the differences in goodness-of-fit of CASP models (refined and unrefined) versus the reference structure we calculated the Pearson correlation coefficient between the residue-dependent SD of their SMOC scores. We found that the SD is anticorrelated with the reference SMOC score, while the mean of the SMOC in the refined models is correlated with the reference SMOC score (mean Pearson correlation coefficient across all targets is À0.44 and 0.76, respectively). This result indicates that regions of lower quality-of-fit in the reference structure tend to exhibit higher variability among CASP models (prior to refinement). Figure 5 illustrates improvement in coordinates-to-map fit for  Figure S3). The graphs show that the fit improved substantially during the refinement, reaching the accuracy of the reference in many cases.
The regions with high SD tend to be less refinable.

| Local improvement of specific elements
To understand the improvement in CASP models following densitybased refinement, we examined specific cases where such models F I G U R E 4 (A) Single domain refinement: Reference structure (blue), best model (orange), best model after refinement (red), against the map, contoured to only show high-intensity voxels (gray). Bright colors indicate that the structure is outside the density. (B) Average CCC (TEMPy) before and after refinements, across targets, compared to the reference structure. The refinements significantly improve the average CCC for all targets. Target name is on the left axis and resolution is on the right The predicted mode for T1047 ( Figure 6C) is another example of SSE movement. The top predicted model for this target (AlphaFold2) is not in the original list of structures we selected for refinement due to its low GDT_TS (50.4) (although LDDT was above the cutoff-75).
However, in this case, the fold is partly correct with the SSEs slightly rotated and therefore we decided to test if the accuracy of the structure can be improved with refinement.

| Domain orientation: AR9 RNA polymerase (T1096-D1-D2)
In this example, both domains of the T1096 subunit of the RNA polymerase were correctly predicted by AlphaFold2 (with very high accuracy: GDT-TS of 83.63 and 78.80) ( Figure 6C). However, the linker between the two domains was predicted incorrectly, resulting in a wrong orientation between them. In this case, refining the model in the density map easily fixed the problem.

| DISCUSSION
A sizeable portion of CASP14 targets (22% of EUs) was determined with cryo-EM. The accuracy of the submitted models for cryo-EM targets is equivalent to that for X-ray targets. Not surprisingly, the F I G U R E 5 SMOC score for the reference structure (blue), high-accuracy CASP models (orange) and the same models after refinement (red) for three selected targets. The transparent lines represent SMOC for individual models and the average of the SMOC scores is shown in a thick line while SD is shown in a dotted line before (orange) and after (red) refinement ranking of the participating groups on cryo-EM targets is consistent with those on all CASP14 targets, with the AlphaFold2 group topping the rankings, with a big lead over other participants. As cryo-EM structures tend to differ from X-ray or NMR structures in their size or complexity of quaternary structure, it is interesting to look at predic- The anticorrelation observed between the SD of the SMOC scores in the unrefined models and the reference structures is likely due to the intrinsic dynamic property of some regions, that is captured to some extent by both the cryo-EM experiment, and the ensemble of prediction models represented in CASP. These regions may exhibit higher flexibility (from either disorder or alternative conformers), resulting in locally lower resolution in the map, leaving the density in the region poorly resolved; this would further explain the difficulty in refining those regions, and maybe suggest that it is better to describe these regions with multiple conformers rather than one. 10 Potentially CASP models could be used to estimate zones of increased difficulty, both experimentally and computationally, by looking at the local divergence in an ensemble of structures generated by different prediction methods.
The work presented here shows that sequence-based prediction with subsequent refinement can now rival the quality of reference models. The correlation between the reference structure quality and the variability in predicted structures provides a new avenue to identify regions of uncertainty in modeling approaches. We see cryo-EM structures becoming an important player in future CASP experiments, potentially helping the development of better prediction methods for protein dynamics and assembly.

ACKNOWLEDGMENTS
We thank Dr. Sony Malhotra for helpful discussions. Andriy

PEER REVIEW
The peer review history for this article is available at https://publons. com/publon/10.1002/prot.26216.

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study. . Left: SMOC scores for the reference structure (blue), best predicted structure before refinement (orange) and after (red). The gray and green shaded areas indicate regions of lower score in the predicted structure, highlighting a locally worse fit. Right: Representation of the structures within the map. The regions surrounded by a blue and black oval correspond to the shaded areas in the left diagram. (C) Structure of the reference L/P flagellar ring structure subunit T1047 (blue), best predicted model (orange) and after refinement (red). (D) Structures of the reference structure T1096 (blue), and best predicted model before refinement (orange) in the map