RNA target highlights in CASP15: Evaluation of predicted models by structure providers

The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA‐only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x‐ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA–protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.


| INTRODUCTION
Experimental structural biologists are integral to the success of the Critical Assessment of Techniques for Structure Prediction (CASP)   and are increasingly benefitting from the predictive capabilities enabled by experiments like CASP.Complementing the RNA-puzzles efforts for double-blind RNA three-dimensional (3D) structure prediction, [1][2][3][4] in this first RNA category of CASP (CASP15, 2022), 5 10 RNA and 2 RNA-protein complexes were suggested as modeling targets by 6 structure determination groups from 4 countries.All targets were released for prediction from May to July 2022.Among these, four targets were solved by x-ray crystallography and eight by cryogenic electron microscopy (cryo-EM).
This article follows the tradition of protein CASP target highlight articles, [6][7][8][9][10][11] where each section provides the accounts of the structure providers and their insights into the accuracy of the models submitted.All target providers were invited to contribute to this paper, with five groups accepting the invitation.Groups that provided multiple targets were asked to limit their description to a minimal selection of targets with unique insights to reduce redundancy.This resulted in five sections highlighting nine of the targets (Table 1).
The numerical evaluation of CASP15 RNA models is available at the Prediction Center website (https://predictioncenter.org/casp15/ results.cgi?tr_type=rna).The detailed evaluation of these predicted models, including direct comparisons and refinement to x-ray and cryo-EM data, is provided elsewhere in this issue. 12,13| RESULTS 2.1 | Human and chimpanzee cytoplasmic polyadenylation element-binding protein 3 ribozymes (CASP: R1107 and R1108, PDB: 7QR4 and 7QR3) provided by Benoît Masquida Cytoplasmic polyadenylation element-binding protein 3 (CPEB3 protein) binds CPEs of messenger (m)RNAs to regulate poly-A tail extension and translation.14,15 It plays a role in memory acquisition and maintenance requiring tight posttranscriptional regulation in mammals.One regulatory mechanism intervenes at the posttranslational level and comprises SUMOylation of a lysine residue from an F-actin-binding region embedded in the N-terminal prion domain.
SUMOylation prevents binding of the protein to actin filaments and contributes to localization of the protein in P-bodies together with the stalled tissue-dependent mRNA targets.[18][19] The CPEB3 gene encodes a ribozyme conserved in the mammalian order, embedded in the second intron of the pre-mRNA. 20This ribozyme is very similar to the Hepatitis delta virus (HDV) ribozyme, although its characteristic slow cleavage activity both in vitro and in vivo allows a subtle coupling with splicing.Slowing down catalytic activity using antisense oligonucleotides prevents the formation of a T A B L E 1 CASP15 RNA targets included into this study.In addition, there are many templates for the U1A-protein-binding loop.
catalytic structure and results in increasing the cellular levels of both CPEB3 mRNA and protein. 21A constructs of the human and chimpanzee CPEB3 ribozymes modified by insertion of a U1A protein-binding motif in place of the wild-type P4 were co-crystallized in the presence of the U1A protein to foster crystal-packing contacts 22 (Table 1).A30 of the human ribozyme is changed to G30 in the chimpanzee homologue.The difference between these structures is in the region of the mutation (P1, J1/2), where the human homologue has C7 bulged out.The crystal structures show an overall organization consistent with that of the HDV ribozyme wherein helix P1 stacks onto helix P4 on one side and helix P2 stacks on helix P3 on the other (Figure 1A-C). 23Ribozyme dimers were obtained in both cases, which was told to the predictors, although the dimer was non-crystallographic for the chimpanzee RNA.
The dimerization occurs through the L3 loops of two molecules like a handshake.L3 contains the two residues (U21, C22) involved in Secondary structure of the HDV ribozyme as deduced from the crystal structure (PDB: 1DRZ). 81Among the characteristic structural elements, P1 forms a nested double pseudoknot together with P1. formation of the characteristic HDV-ribozyme-like double-nested pseudoknot with two residues in J1/4 (G37 and U38).A L3 palindromic sequence stretch 5 0 -A(23)CGU-3 0 makes dimerization possible and hence prevents formation of a competent catalytic pocket.[26] The predicted models for the CPEB3 ribozyme are all monomers.
Models with root-mean squared deviation (RMSD) around 5 Å correctly predict the main secondary structure elements (P1, P2, P3, P4) as well as their relative positions.In the 5 Å RMSD range, the main discrepancies correspond to the residue conformations belonging to non-helical regions.Although the U1A loop is well predicted, perhaps because known structures could be used as a template, the L3 loop departs significantly from the observed anticodon-like conformation.
Instead, a U-turn occurs between U21 and C22 (compare Figures 1D   and 1E), which is involved in the P1.1 pseudoknot in the template HDV structures but not in the CPEB3 ribozyme structures.This modeling error is most probably due to treating the CPEB3 ribozyme target as a monomer instead of a dimer, which was a known experimental condition to the predictors.The oversight of dimerization state may account for the lack of predicted model with RMSD values better than 4.52 Å for the human ribozyme and 5.48 Å for the chimpanzee.
Another region of the human ribozyme that was wrongly predicted, independently from the dimerization, is the J1/2 stretch.In the best model R1107TS232_1, generated by AIchemy_RNA2 (Figure 1E), P1 is closed by a sugar edge-Hoogsteen A8-A30 pair and the nucleotides upstream and downstream from A8 stack on each other, adopting a helical conformation to conduct the strand to the inlet of P2.However, in the crystal structure, the C residue is expelled into solvent and the sugar edge of the contiguous A residue interacts with the Watson-Crick (WC) edge of A30 in P1.The situation for the chimpanzee ribozyme is different since a G-C pair is formed at the tip of P1, which was easier to identify.
Looking at models with worse accuracy, the increase of the RMSD values up to 10 Å is associated with misfolding of the ribozyme, including strand crossing and also topological differences compared to the crystal structures (Figure 1F).In model R1107TS054_3, generated by the UltraFold server, the connection between P4 and P2 is made on the deep groove side of P3 instead of on its shallow groove side, which scrambles the catalytic site.This conformation would result from a different folding process since the single strand J4/2 ends up on the other side of P3.For models in this range of RMSD values, accuracy of loop modeling is also worse.For example, the U1A region is modeled as a simple loop where a U-turn is mediated adequately to reverse the backbone direction and mediate loop closure without attention to individual nucleotide conformations or the fact that this loop is bound to the U1A protein.
Beyond RMSD values of 10 Å, aberrant secondary structure elements appear.For example, in model R1107TS229_1 from Yang_Server, the two strands forming P3 are split and reorganized around a three-way junction connecting P2, the L3 loop, and a P4 element which presents a four base pair extension encompassing the second strand of P3 and the residues from J4/2.This results in a profound reorganization of P4 caused by the interaction with the second strand of P1 (green base-paired region in Figure 1G).This leads to the misfolding of the U1A protein-binding site.
To summarize, for models that achieved below 5 Å RMSD, conformational discrepancies are mostly observed for residues belonging to loops (Figure 1E).Around 10 Å RMSD values, additional strand crossing events are observed leading to topological differences resulting from different folding pathways (Figure 1F).Finally, when close to 20 Å RMSD, shuffling of the strands constitutive of individual helices may generate spurious secondary structure elements resulting in the loss of similarity between models and reference structures (Figure 1G).

|
Small prequeuosine 1 riboswitch (CASP: R1117, PDB: 8FZA) provided by Griffin M. Schroeder and Joseph E. Wedekind Riboswitches are gene-regulatory elements usually located in the 5 0 untranslated region of bacterial mRNA. 27Riboswitches regulate downstream genes by use of an aptamer domain that senses a cellular metabolite with high specificity. 28Metabolite binding triggers conformational changes in a nearby, gene-regulatory expression platform that induce transcription termination or translation initiation. 29,30The cognate ligand is usually a cofactor or metabolite intermediate related to the downstream gene, allowing the riboswitch to maintain bacterial homeostasis through feedback loops. 28Importantly, dysregulation of riboswitches has been shown to decrease bacterial fitness, making riboswitches attractive drug targets. 31 the over 55 classes of validated riboswitches, 32 one of the best studied is the prequeuosine 1 (preQ 1 ) sensing family.One 33 or two 34 preQ 1 metabolites bind per aptamer domain, which adopts a distinct architecture that falls into one of three folding classes.Of these classes, the class I riboswitch is the most widely distributed among bacteria and is the most prevalent preQ 1 riboswitch in the biosphere. 33This class can be divided into three subgroups known as types I-III (preQ 1 -I I-III ).Although each subgroup is predicted to fold into an H-type pseudoknot, 33 we previously demonstrated that types I and II show different aptamer-to-preQ 1 binding stoichiometries despite sharing a common global fold. 34At present, little is known about the type III subtype, which is found almost exclusively in proteobacteria. 33Accordingly, we determined the co-crystal structure of a preQ 1 -I III (class I type III) riboswitch (PDB: 8FZA, Table 1) to ascertain how preQ 1 recognition leads to gene regulation by this clinically relevant riboswitch subclass.
To obtain diffraction-quality crystals, a poorly conserved turn between helix P1 and loop L3 (Figure 2A) was modified to yield a small 30-mer construct ideal for structure prediction, which was submitted to CASP15.Consistent with the covariation model 33 (Figure 2A), the crystal structure revealed a highly compact H-type pseudoknot (Figure 2B) featuring two helical regions, P1 and P2, II preQ 1 riboswitches [34][35][36][37][38] .This observation underscores the need for more experimentally derived templates.
Metabolite binding is an area of functional interest and ligand binding stabilizes the structure by promoting coaxial helical stacking.
5][36][37][38] Specificity base 39 Cyt14 uses cis WC pairing to engage preQ 1 (Figure 2C).The minor-groove edge of the metabolite is read by Uri6 and Ade27, while the methylamine donates hydrogen bonds to Gua5 and the backbone of Cyt12.The Chen model did not attempt to predict the mode of preQ 1 binding.
Rather, their model predicts that metabolite-interacting nucleobases are unpaired, although they are oriented similarly to the bound-state crystal structure (Figure 2C).However, Gua5, Uri6, Cyt14, and Ade27 of the Chen model pack more closely in the core while the Cyt12 phosphate bulges outward.Thus, the orientation of these nucleobases in the Chen model prevents hydrogen bond contacts to preQ 1 (Figure 2C).Curiously, the Chen apo model does not predict nucleobase incursion into the preQ 1 binding pocket, although this effect was observed previously-along with L2 loop unstacking-in apo-state cocrystal structures of a related Thermoanaerobacter tengcongensis (Tte) preQ 1 -I II riboswitch 35,40 .Hence, Chen's apo-state prediction actually resembles a bound state, possibly due to bias from bound-state templates in which the ligand was removed.Nonetheless, gross details of the fold were predicted correctly.
The co-crystal structure of the aforementioned Tte preQ 1 -I II riboswitch further revealed that the binding pocket ceiling forms a base quartet. 35,36We found previously that this quartet plays a key role in the preQ 1 -free to bound-state interconversion. 40In our co-crystal structure of the preQ 1 -I III riboswitch herein, we observe a similar pocket comprising a Cyt8•Ade28-Uri11•Ade13 quartet (Figure 2D).structure, only the interaction between N6 of Ade13 and O2 of Uri11 is present (Figure 2D).Although the covariation model predicts a WC pair between Uri11 and Ade28 33 (Figure 2A), the Chen model predicts that Uri11 twisted downward into the preQ 1 pocket and shifted toward Cyt8 (Figure 2D) where it cannot hydrogen bond with Ade28.
Similarly, Cyt8 adopts a dramatically different orientation that pivots the nucleobase upward and away from the planar rings that compose the pocket ceiling, thereby precluding formation of the Uri11-Ade28•Cyt8 triple (Figure 2D).Atop the pocket ceiling, the next two SDS nucleotides, Gua29 and Gua30, form WC interactions in the co-crystal structure consistent with covariation predictions 33 (Figure 2A,E).This interaction is correct in the Chen model, although Cyt9 shows substantial propeller twist (Figure 2E).Overall, the structural basis of gene regulation in the Chen model is largely consistent with predictions from the covariation model. 33Albeit, the preQ 1 pocket and ceiling differ in important ways from the experimental coordinates.
The fourth Chen model (R1117TS287_4) is the most accurate RNA prediction in the CASP15 competition according to the Global Distance Test-Total Score (GDT-TS) and RMSD (Table 1).This laudable achievement may be due to the similarity of its global fold to known preQ 1 -I riboswitch structures [34][35][36][37][38]40 and the small target Recent improvements to the automated design software for RNA origami (ROAD) have allowed us to rapidly generate many unique new design patterns and easily incorporate RNA aptamers into the designs. 43 een to validate the fidelity of our designer RNA from in silico to in vitro, we pursued structural determination of our cotranscriptionally folded and natively purified RNA using cryo-EM.44 During this process, we encountered numerous deviations between our designed structures and our experimentally determined structures, notably in the twist, bend, and topological arrangement of helices.
In our opinion, synthetic nucleotide sequences represent a particularly interesting class of targets for structural prediction contests because they have very little sequence similarity to known RNA structures, forcing predictions to rely first on the principles of RNA folding rather than comparative modeling.Furthermore, the motifs that are incorporated from known structures and can be generated by comparative modeling (e.g., KLs and aptamers) are often different in the context of a larger RNA and in solution than they are in isolation in a crystal structure. 45r design process includes validation of our sequences by comparing the predicted secondary structures from Vienna RNA 46 and NUPACK 47,48 to our designs.This, coupled with the almost entirely base-paired nature of our designs, means that prediction of the correct base pairing arrangement (i.e., secondary structure) should be trivial and the real challenge lies in correctly predicting the topological arrangement of helical elements and subtle deviations from ideal A-form helices.
Here, we compare the best model from the top five groups (lowest global RMSD to our model) to two of our target submissions.Not surprisingly, the base pairing was almost always correctly predicted.
The most frequent deviations from our model were typically the result of missing pseudoknot interactions (i.e., long-range tertiary interactions) or incorrect modeling of the four-way junctions.However, we were thrilled to see that at least two groups consistently modeled our targets in silico with excellent agreement to our experimental models.
In some cases, their predictions were closer to the empirical structure than our initial design.
Our simplest target was a 238-nucleotide RNA (Figure 3A) comprising 3 helical domains connected by 2 four-way junctions and a paranemic crossover (PX) (CASP: R1128 PDB: 8BTZ). 49Out of the best models from the top five prediction groups, only two accurately modeled the topology (Figure 3B,C), two failed to find the PX (Figure 3D,E), and one modeled the PX but failed to coaxially stack the 5 0 -and 3 0 -end helices into a single continuous helix (Figure 3F).To accommodate this tight packing of three crossovers, it appears that at least one of the helices must adopt a slight bend.In our cryo-EM reconstruction this is helix 2. The two best CASP predictions show a more noticeable bending of helices 1 and 3 (Figure 3C) or just helix 3 (Figure 3B).It is our opinion that these alternate bends are likely sampled as part of the dynamic range of conformations adopted by the RNA in solution.
Our largest target was a 720-nucleotide sequence (CASP: R1138 PDB: 7PTK,7PTL) that was designed to form a hexagonal arrangement of 6 parallel helices, connected by 10 four-way junctions and 5 internal KLs.The RNA structure was found to have an unusually stable folding intermediate (PDB: 7PTK and Figure 3G) that persists in solution for several hours after transcription.This early conformation has the final "latching" helix laying across the other helices; we followed the transition from this early state to a matured state using small-angle x-ray scattering and determined the half-life of the early structure to be $10 h, after which it rearranges into a more compact structure with the latch helix more parallel to the rest of the bundle, but still not completely parallel as we designed it (PDB: 7PTL and Figure 3H). 44edictors were told that there were two alternative cotranscriptional structures from different time points after transcription, but only made one set of submissions.
Among the top five predictions for the bundle, one group did not model any of the KLs (Figure 3M) and another group found four out of five KLs, seemingly missing the fifth KL by not accounting for the curvature induced by the crossover seams that allows the latch helix to make the final KL and by having incorrect helical stacking across the four-way junctions in the 5 0 half of the bundle (Figure 3L).Two groups modeled all KLs and predicted the curvature from the crossover seams almost exactly as we designed in silico (Figure 3J,K).Most excitingly for us, the AIchemy_RNA2 group produced a model that more closely matches the empirical structure of the mature conformation than our initial design did (Figure 3I)!None of the groups predicted the early conformation, perhaps because groups opted to predict the structure closer to the equilibrium folding state, the mature conformation, which is more typical of past structure prediction challenges.
As a final comment, in our designs we frequently used the HIV DIS KL, based on the crystal structures from Ennifar and Dumas. 50om our highest resolution cryo-EM map, we were able to determine that the KL is more compact than in the crystal structure, resulting in a twist defect that is compounded by the number of KLs incorporated.
The main difference between our structure and the crystal structure is that the unpaired adenines in our model stack within the helix, while the adenines from the crystal structure are bulged out.Although we designed our structures to have straight helices throughout, this twist defect resulted in an inherent strain throughout our origami and bending of the helices to accommodate this.It appears that the CASP predictors also used this crystal structure to seed their predictions as most of the KLs in the predictions are bulged out.Perhaps consequently, the predictions have much straighter helices than we observe in our empirical structures.
In conclusion, the secondary structures of our synthetic sequences were successfully predicted by most participants.Tertiary structure proved more challenging, especially with long-range pseudoknots, four-way junctions and with motifs that differ in our cryo-EM maps compared to crystal structures.However, at least one group was consistently able to accurately predict the approximate 3D structure.Finally, the inherent complicating factor with RNA is flexibility, which presents challenges not only to prediction but to assessment of predictions.Each structure we submit as a target is the result of averaging thousands of slightly different conformations of the same overall structure using cryo-EM single particle analysis methods.Although the CASP submissions were scored against a single PDB model built into our best resolved reconstruction, we know from 3D variability analysis of our cryo-EM data sets that there exists a dynamic range of conformations. 44,45We anticipate that incorporation of structural dynamics and co-transcriptional folding pathways will be the next major hurdle for RNA structure prediction.Coronaviruses have a highly structured 5 0 region with several "stem loop" (SL) elements; SL5 was predicted to fold into a four-way junction in most SARS-related betacoronaviruses. 51,52[55][56][57] The tertiary organization of this junction and the degree of its structural conservation is unknown.In fact, previous computational modeling of the SARS-CoV-2 SL5 suggested that this domain might not have a well-defined tertiary structure. 57We were pleased to resolve defined tertiary structures for SARS-CoV-2 (R1149) and BtCoV-HKU (R1156) SL5 domains by cryo-EM.These were well suited for evaluating 3D structure prediction because these RNA folds can be simplified to a handful of elements while also introducing conformational heterogeneity that the RNA and macromolecule modeling communities are increasingly interested in.
From multidimensional chemical mapping, 58 medium-resolution cryo-EM maps, 59 and heterogeneity analysis, 60 we obtained one map for the SARS-CoV-2 SL5 domain and four maps for the BtCoV-HKU5 SL5 domain, after data analysis suggested flexibility in SL5a (Table 1).
We generated 10 models for each map (10 and 40 models total, respectively) to represent our experimental uncertainty, due to their medium resolution nature.Overall, we were pleasantly surprised to find that some predicted models were superimposable on experimental models, and achieved reasonable accuracy, by global metrics such as GDT-TS, including submissions from GeneSilico (TS128) and Deep-FoldRNA (TS110) highlighted here (Figure 4F,H).For BtCoV-HKU5, we were glad to have conveyed an experimental structure ensemble, because the top model from GeneSilico, R1156TS128_5, was an excellent fit to an intermediate conformation but would not have been an excellent fit to our highest resolution map, which captured the highest bend angle of SL5a (Figure 4H).Keeping the resolution and flexibility in mind, we enumerated features, and investigated how well the top predictions modeled these features as well as why some models predicted these features but did not score well globally.HKU5 (56% and 33%, respectively) correctly stacked the bases at the junction (Figure 4A,C).These observations suggest that predicting coaxial stacking is still a challenge, despite past literature reporting higher accuracies. 61There is a significant difference in junction stacking prediction accuracy, suggesting prediction may be more challenging for the BtCoV-HKU5 SL5 domain (χ 2 = 7.8, p = .005).
Interestingly, the top model by GDT-TS for SARS-CoV-2, DeepFoldR-NA's R1149TS110_2, exhibited incorrect base pairing and stacking at the junction indicating that correctly predicting these features is not a prerequisite to obtaining overall topology (Figure 4G).
The third observation for both SL5 domains was that (3) the pairs of coaxially stacked helices were at a $90 angle with antiparallel strands.This proved to be the most challenging task, even with a lenient angle criterion of À90 ± 30 .Only seven models for SARS-CoV-2 and two models for BtCoV-HKU5 passed this criterion (21% and 13%, respectively, of submissions that passed our previous two criteria).For SARS-CoV-2, the predicted models exhibited a wide range of angles with both parallel and antiparallel conformations proposed and no clear preferred orientation among the models (Figure 4B).For BtCoV-HKU5, we also saw a wide range of angles proposed, with a slight preference for models closer to the parallel orientation than the experimental models (Figure 4D).R1156TS287_5 from the Chen group is an example of a model that was accurate except for the angle between the helices; it is in an antiparallel orientation, causing this model to be topologically inaccurate (Figure 4H).
The prediction of junction angles seems to be a challenge, but models like GeneSilico's R1149TS128_1 and R1156TS128_5 show that it is possible (Figure 4F,H).
Finally, we observed additional features in the BtCoV-HKU5 domain, (4) the apical loop of SL5c interacts with the internal loop of SL5a, and (5) SL5a bends at the internal loop with a continuous angular range spanning $30-80 .Among the 15 models that exhibited correct junction stacking and base pairing, only 4 modeled an interaction between SL5a and SL5c (defined as these 2 regions being within 3.5 Å), with only 1, GeneSilico's R1156TS1128_5, correctly modeling the junction orientation (Figure 4C).Although another top model by GDT-TS, R1156TS119_3 from the Kihara lab, did not predict this interaction, it was able to obtain the correct helical orientation and a bend in SL5a (Figure 4H,J).In contrast, R1156TS287_5 from the Chen group was able to model the SL5a-SL5c interaction and a bend in SL5a, yet, modeled a very different helical orientation outside the range of conformations captured by cryo-EM (Figure 4H,J).In general, all of the models predicting an interaction between SL5a and SL5c also predicted a bend in SL5a, with many falling within the experimental bend range (Figure 4E).Interestingly, a model for the SARS-CoV-2 SL5 domain, R1149TS035_5 from Manifold-E, included a SL5a-SL5c interaction and a helical bend in SL5a that was not observed by cryo-EM for that domain (Figure 4F).These models have motivated us to further investigate the relationship between the SL5a-SL5c interaction and the SL5a bend as well as improvements in experimental resolution for more precise experimental description of this interaction.
In summary, the prediction community successfully predicted the global topology, junction geometry, and other interactions of the two coronavirus SL5 domain targets.However, the wide range and uniform distribution of helical orientations in the models, which was not observed experimentally (Figure 4B,D), suggests that selecting accurate models may be difficult.Despite the generally impressive performance of certain groups such as AIchemy_RNA2 (TS232) and GeneSilico (TS128), groups submitted a variety of topologies for both SL5 targets and the best predictions were not consistently ranked as model 1 of their five submissions.There is significant potential for improvement in both experimental determination and prediction, particularly for increasing accuracy and detail at junctions and other tertiary interactions that are critical for drug discovery efforts. 62,63A final challenge is predicting the ensemble of conformations, for example, to predict whether the range of SL5a bend angles are small, as in the SARS-CoV-2 domain, or larger, as in the BtCoV-HKU5 domain.
2.5 | RNA-protein complex of RsmZ and RsmA (CASP: R1189 and R1190, PDB: 7YR7 and 7YR6) provided by Bingnan Luo, Janusz M. Bujnicki, and Zhaoming Su Pseudomonas aeruginosa is an opportunistic pathogen that infects hospitalized immunocompromised patients with high mortality rate. 64e acute and chronic virulence of P. aeruginosa could be regulated by types III and VI secretion systems, 65,66 biofilm formation, 67 and quorum sensing, a cell density-based intercellular communication network. 680][71][72][73] RsmZ is a small noncoding RNA that can bind to RsmA and modulate RsmA regulation. 74Previous studies showed that RsmA can form a homodimer to recognize two separate GGA-binding sites, [75][76][77][78] but the molecular mechanism of the full-length RsmZ sequestration of RsmA and regulation of P. aeruginosa virulence remains unknown.
We obtained the RsmZ-A complex by incubation of in vitro transcribed full-length RsmZ with recombinantly expressed RsmA with a molar ratio of 1:4.The cryo-EM structure of RsmZ in complex with three RsmA homodimers (RsmZ-A 3 ) at 3.80 Å resolution (PDB: 7YR7 CASP: R1189 Table 1) showed that RsmZ comprises six consecutive SLs (SL1-SL5 and a terminator stem loop SL ter ) with six GGA-binding sites in the loop regions of SL1-SL5 and a single-stranded junction (J2/3) grouped into three pairs, SL1 and SL5, SL2 and SL3, J2/3 and SL4. 79In addition, we observed another conformation of RsmZ in complex with two RsmA homodimers (RsmZ-A 2 ) at 4.60 Å resolution (PDB: 7YR6 CASP: RT1190 Table 1), with the binding site between SL2 and SL3 unoccupied and a subtle change of 6.5 in SL ter.

79
In CASP15, RNA and ribonucleoprotein (RNP) molecules were introduced into structural prediction and assessment for the first time, with both RNA and protein sequences and the binding stoichiometry provided.The best RMSD of all predictions of each of the 12 RNA targets were all better than 10 Å, except for the RNA targets from the RsmZ-A 2 and RsmZ-A 3 complexes (R1189 and R1190; Figure 5A).All Instead, we assessed the prediction results by aligning on either the stacking SL1-SL2 (nts 1-35, Figure 5F) or the longest SL ter (nts 86-118, Figure 5G).In both assessments, we observed drastic deviations in the rest of the RNA structure.
A previous study on the RsmZ-E complex structure based on nuclear magnetic resonance and electron paramagnetic resonance from P. fluorescens revealed similar 3D architecture of a proteinbinding site consisting of SL2 and SL3 compared to our RsmZ-A complex structures from P. aeruginosa. 79

| CONCLUSIONS
Here, we provide insight into the functional and structural relevance of nine of the RNA targets of CASP15 from the perspective of the scientists who determined the experimental tertiary structures.These analyses complement the CASP assessors' comments 12 with functionfocused analysis, deeper focus on structural regions of importance, and comments on the utility of the current predictions for practical application in RNA structure research.
There were a few groups that consistently impressed.AIchem-y_RNA2 was highlighted for their models of the CPEB3 ribozyme and RNA origami nanostructures; the Chen group submitted a very accurate model for the type III preQ 1 riboswitch; the GeneSilico group achieved especially high accuracy for the coronavirus SL5 structures; and while all groups were challenged by the RNA-protein complexes, several groups predicted accurate secondary structure.Topologically similar structures were obtained for all but the RNA-protein complexes.We will now summarize challenges from the experimentalists' perspectives to stimulate further advances.
Although global topologies were predicted, there was a desire for  Despite room for improvement, particularly relative to the accuracies now enjoyed in the protein world, RNA predictions and experiments may be synergistic at this early stage.For example, the models can be used directly in the experimental structure determination.In the case of the preQ 1 riboswitch, the models allowed the structure determination from experimental x-ray data by MR.Elsewhere in this issue, the utility of models for MR and refinement into cryo-EM maps for all targets is discussed in more detail. 12,13Furthermore, the range of models submitted sparked questions about the limitations of traditional comparisons against one native structure.It is noted in both the RNA origami and the coronavirus sections that models not fitting the highest resolution experimental structure are not necessarily inaccurate, but may take on another state within the ensemble.As a future challenge, experimentalists, assessors, and predictors should emphasize analysis of RNA structures in solution or flash frozen from solution, with incorporation of structural dynamics and other heterogeneities like folding pathways that can now be captured by cryo-EM.
Overall, the RNA CASP15 experiment highlighted the utility of RNA tertiary structure predictors, but also the areas of improvement for predictors to support broader benefits.In its first iteration in CASP, the RNA structure community as a whole widely participated including the six groups that provided a total of 12 new RNA structures in the short three-month prediction season.Increased participation of experimentalists and predictors will continue to improve RNA tertiary structure prediction and its practical applications.
1 and P2.(B) The secondary structure of the human and chimpanzee CPEB3 ribozymes as deduced from the crystal structure (C) shows that the P1.1 element (purple) is not formed and instead the ribozymes form dimers with a neighboring molecule (shaded gray for clarity).The residues are numbered according to the P4 wild-type sequence framed.(D-G) Comparison of representative models with the crystal structure of the human CPEB3 ribozyme.(D) In the crystal structure of the human CPEB3 ribozyme (PDB: 7QR4), 23 the distal location of the P1.1 forming elements (purple) is indicated by a symbol (j---j).(E) R1107TS232_1, AIchemy_RNA2, RMSD 4.52 Å; (F) R1107TS054_3, UltraFold, RMSD 8.13 Å; (G) R1107TS229_1, Yang_Server, RMSD 17.92 Å. CPEB3, cytoplasmic polyadenylation element-binding protein 3; HDV, Hepatitis delta virus; RMSD, root-mean squared deviation.
joined by three loop regions, L1-L3.Many groups-most notably the Chen, GeneSilico, and AIchemy_RNA2 groups-correctly predicted the global fold.However, we were struck by the fourth model generated by the Chen group (R1117TS287_4) because its all-atom RMSD with our experimental structure was 2.01 Å (Figure2B), the lowest of all CASP15 RNA targets.Major areas of deviation include the sharp L1-P2 bend at Cyt8 located in the ceiling of the binding pocket (RMSD of 6.39 Å at atom O2), Cyt12 in loop L2 (RMSD of 5.67 Å at atom OP1) and the P1-L3 turn (RMSD of 7.68 Å at atom O2 0 of Ade21), which was modified to promote crystallization.Both AIchem-y_RNA2 (R1117TS232_1) and GeneSilico (R1117TS128_1) produced slightly poorer predictions based on global RMSD values of 2.27 and 2.43 Å.Like Chen, the latter two models showed difficulties predicting mainchain and base positions at Cyt8, Cyt12, and P1-L3.These pseudoknot loop and turn regions showed substantial conformational differences when comparing co-crystal structures of known types I and

Uri11 stacks atop preQ 1 ,
interacts with Ade13 through its sugar edge, and forms a WC pair with Ade28.The Hoogsteen edge of the latter base also interacts with Cyt8 (Figure2D), which is notable because it is part of the Shine-Dalgarno sequence (SDS).Thus, our co-crystal structure provides insight into how preQ 1 recognition leads to gene regulation through sequestration of the SDS in the pocket ceiling.By contrast, interactions in the pocket ceiling are sparse in the Chen model.Of the six hydrogen bonds observed in our co-crystal F I G U R E 2 Covariation model and comparison of the preQ 1 -I III riboswitch co-crystal structure to the best predicted CASP15 model.(A) Covariation model based on previous data. 33(B) Global superposition of the experimental model (PDB:8FZA, purple) with the top prediction model (R1117TS287_4, gold).(C) Close-up view of the preQ 1 binding pocket.The metabolite (green) was derived from the co-crystal structure.(D) Close-up of the pocket ceiling.(E) The expression platform showing WC pairing of Gua29 and Gua30 of the Shine-Dalgarno sequence.CASP15, Critical Assessment of Techniques for Structure Prediction 15; preQ 1 -I III , prequeuosine 1 class 1 type III; WC, Watson-Crick.

2 . 3 |
size.Although details related to metabolite binding and gene regulation were somewhat obscured (Figure2C-E), the predicted structure (Chen model R1117TS287_4) succeeded as a molecular replacement (MR) search model after minor modifications.Specifically, we removed residue 1 at the 5 0 -end of the P1 stem and residues 20-23 at the P1-to-L3 turn in the search model (Figure2B).This search model yielded a translation-function Z-score of 8.4 and a loglikelihood gain of 168 in Phenix.41These modifications were obvious choices based on their lack of conservation in the covariation model33 and were necessary for crystal packing.Overall, the Chen model and others represent valuable tools to predict the global folds of small RNAs and to facilitate their experimental structure determinations by MR.RNA origami (CASP: R1128 and R1138, PDB: 8BTZ, 7PTK, and 7PTL) provided by Ewan K. S. McRae and Ebbe S. Andersen RNA origami are tertiary structures that are designed to fold during transcription.The RNA origami architecture utilizes coaxially stacked 4-way junctions and internal pseudoknots (kissing loops [KLs]) to create a network of helical components from a single strand of RNA.42

For
both the SARS-CoV-2 and BtCoV-HKU5 SL5 domains, we observed that (1) the junction was tight, that is, it had four closing base pairs without any unpaired bases in the junction; (2) the outermost SL5-stem was stacked on SL5c, and SL5a on SL5b.Focusing on the region of interest, the junction, 73 SARS-CoV-2 models, and 46 BtCoV-HKU5 models recovered correct base pairing at the junction (Figure 4A,C), an expected level of accuracy given information in the literature, but, we saw some groups modeled a looser junction with unpaired bases.The second challenge was deciding if the four stems were coaxially stacked at the junction, and if so, what the coaxial stacking pattern was.Of submissions predicting a tight junction, 43 of 73 models for SARS-CoV-2 and 15 of 46 models for BtCoV-

F I G U R E 4
Categorization of all R1149 (SARS-CoV-2 SL5 domain) submitted models (A) and all R1156 (BtCoV-HKU5 SL5 domain) submitted models (C) by features they correctly predict.In the Venn diagram (not to scale), areas are labeled with the number of models that correctly predict the features whose circles overlaps in that area: base-pairing (blue) and base-stacking (green) at the junction, angle between SL5-stem-SL5c and SL5a-SL5b (pink), and presence of SL5a-SL5c interaction (yellow, R1156 only).Angle between SL5-stem-SL5c and SL5a-SL5b for R1149 (B) and R1156 (D) colored by categories in (A) and (C) with the experimental structure range marked in pink; 0 is a parallel orientation, 180 is an antiparallel orientation, direction of rotation is defined from the view of (F) and (H) as moving SL5b clockwise.(E) For R1156 models, the bend angle of SL5a at the internal loop as measured by the angle among residues 24-27, 64-95, and residues 28-59.Three example models for R1149 (F-G) and R1156 (H-J) with SL5 stem in gray, SL5a in blue, SL5b in orange, and SL5c in red.(F,H) The predicted structures (dark) and the cryo-EM models (translucent) and GDT-TS score with rank over all models.(G,I) The predicted model's four-way junction with arrows showing 5 0 to 3 0 direction, and (J) the SL5a-SL5c interaction.Cryo-EM, cryogenic electron microscopy; GDT-TS, Global Distance Test-Total Score; SL, stem loop.

3 (
predictions of the RNP targets had RMSD worse than 15 Å.The top-ranked results by RMSD were generated by the Yang groups with an RMSD of 16.3 Å compared to RsmZ-A R1189TS229_3, R1189TS239_3, R1189TS439_3) and an RMSD of 16.0 Å compared to RsmZ-A 2 (R1190TS229_3, R1190TS239_3, R1190TS439_3), respectively.The difference that we first noticed was in RNA secondary structure (Figure5B,C).Although SL1, SL2, SL3, and SL ter were accurately predicted, these predictions missed J2/3, SL4, and SL5.When comparing 3D architectures, it was very challenging to align the entire RsmZ RNA structure (Figure5D,E).
improved accuracy in prediction of local interactions.The CPEB3 ribozyme analysis emphasized improvement in loop conformation prediction.The case of the type III preQ 1 riboswitch calls for increased accuracy in binding pocket prediction vital for informing the gene regulation of this element.The coronavirus structures showed inaccuracies in junction geometries.The difficulty in predicting changes in RNA tertiary structure based on structure determination techniques or conditions also remains a challenge.For example, the design and prediction of the RNA origami structures had systematic inaccuracies because of the use of a KL structure from x-ray crystallography that in-fact was compacted in the solvated cryo-EM structure.With the largest RNA origami, no groups predicted close to an early kinetically trapped state, showing a gap in predicting structures along folding pathways.Furthermore, in the case of the CPEB3 ribozymes, it was speculated that some of the errors made in modeling were because of not accounting for the dimeric state.A final challenge, which has clearly not been overcome, was modeling RNA-protein complexes.Although some predictions generated quite accurate secondary structure and protein-binding sites, the predicted 3D architectures of the RNA remain topologically different from the experimentally determined structure.Enabling the prediction of such complexes requires the combined expertise of protein, RNA, and multimer prediction groups as well as structure determination groups to provide the data that are currently lacking.

F
I G U R E 5 (A) Summary of the overall average RMSD of all predictions in each of the 12 RNA targets.(B) The experimental secondary structure of RsmZ with protein-binding site GGA marked in orange.(C) The secondary structure of RsmZ predicted by Yang_server with protein binding site GGA marked in orange.(D) Cryo-EM model of RsmZ colored the same as the secondary structure.(E) Predicted model of RsmZ by Yang_server colored the same as the secondary structure.(F) Superposition of the cryo-EM (gray) and Yang_server predicted (green) RsmZ structures aligned on the stacked SL1-SL2.(G) Superposition of the cryo-EM (gray) and Yang_server predicted (green) RsmZ structures aligned on the longest SL ter .(H) Secondary structure of top-ranked models by TM-sore and GDT-TS of target R1189.(I) Superposition of all top-ranked models by TM-score and GDT-TS of target R1189 aligned on the protein binding site SL2 and SL3, with cyan from Venclovas, orange from CoDock, magenta from Kiharalab_Server.(J) Superposition of the RsmZ cryo-EM structure (gray) and a representative RsmZ model predicted by RNApolis (blue) aligned on SL2 and SL3.Cryo-EM, cryogenic electron microscopy; RMSD, root-mean squared deviation; GDT-TS, Global Distance Test-Total Score; SL, stem loop.