Modeling of protein complexes in CASP14 with emphasis on the interaction interface prediction

Abstract The goal of CASP experiments is to monitor the progress in the protein structure prediction field. During the 14th CASP edition we aimed to test our capabilities of predicting structures of protein complexes. Our protocol for modeling protein assemblies included both template‐based modeling and free docking. Structural templates were identified using sensitive sequence‐based searches. If sequence‐based searches failed, we performed structure‐based template searches using selected CASP server models. In the absence of reliable templates we applied free docking starting from monomers generated by CASP servers. We evaluated and ranked models of protein complexes using an improved version of our protein structure quality assessment method, VoroMQA, taking into account both interaction interface and global structure scores. If reliable templates could be identified, generally accurate models of protein assemblies were generated with the exception of an antibody‐antigen interaction. The success of free docking mainly depended on the accuracy of initial subunit models and on the scoring of docking solutions. To put our overall results in perspective, we analyzed our performance in the context of other CASP groups. Although the subunits in our assembly models often were not of the top quality, these models had, overall, the best‐predicted intersubunit interfaces according to several accuracy measures. We attribute our relative success primarily to the emphasis on the interaction interface when modeling and scoring.


VoroMQA-dark method for structure quality assessment
VoroMQA-dark method for model accuracy estimation uses a feed-forward neural network (NN) trained to predict local (per-residue) CAD-score 1 values. The training and crossvalidation was done using 28377 models of 199 targets from CASP 8-13.
The targeted NN output for each residue is a vector of three local and semi-local CAD-score values:  CAD-score(depth=0), based on all the inter-residue contacts involving the central residue;  CAD-score(depth=1), based on all the inter-residue contacts involving at least one residue from the first layer of neighbors (the direct neighbors) of the central residue;  CAD-score(depth=2), based on all the inter-residue contacts involving at least one residue from the first two layers of neighbors (the direct neighbors and the neighbors of the direct neighbors) of the central residue.
The NN input vector consists of 299 values describing a protein residue and its environment. The first 20 values are used to describe the residue type using one-hot encoding. The remaining 279 input vector values describe the residue environment and are derived from the Voronoi tessellation-based inter-atom contact areas and the corresponding contact potential values -the same pseudo-energy potential values as in the original VoroMQA method (VoroMQA-light) 2 . The environment descriptors are summed contact areas, summed atomic volumes, and pseudo-energy values (both raw and normalized by the corresponding summed contact area or summed atomic volume). The descriptors are computed for three layers of residue neighborhoods using tessellation breadth-first search and accumulating convolution operations.
As the input vector is "pre-convoluted", convolutional layers were not used in the NN architecture. The final architecture was determined through an iterative five-fold crossvalidation process, monitoring the Pearson correlation of predicted and actual local scores. Data splitting for cross validation was performed on the level of CASP targets. The final NN architecture has only one fully-connected hidden layer of 24 neurons.
The cross-validation results showed that the CAD-score(depth=2) local scores were the most predictable, the average Pearson correlation coefficients between the predicted and the actual local scores were  0.81 for depth=2,  0.77 for depth=1,  0.65 for depth=0. However, surprisingly, the best estimate for the global CAD-score was the average predicted CAD-score (depth=0). The average Spearman correlation coefficients between the estimated and the actual global scores were  0.79 for depth 0,  0.77 for depth 1,  0.77 for depth 2. Therefore, the VoroMQA-dark global structure score is computed by averaging the predicted CAD-score (depth=0) values.

Using VoroMQA-dark for scoring and ranking complexes
For a multi-chain structure, VoroMQA-dark can produce two scores: 1. VoroMQA-dark global score, which is an average of the predicted local (per-residue) scores; 2. Inter-chain interface energy score, the same as in the VoroMQA-light method. The global score and the interface energy score are used to perform a tournament-based ranking of structural models 3,4 . For every model A in a given set of models we counted "wins'' and "draws". A win for model A is when another model B in the set is inferior to A according to both scores. A draw is when B is neither inferior nor superior to A, which happens when the two scores disagree. The models are ranked according to their numbers of wins and draws. Therefore, increasing a tolerance value for comparing global scores is a way to put more emphasis on the interface energy score. Based on the testing with the CASP13 data, different tolerance values were used depending on the structure type:  t=0.01 for homo-oligomers;  t=0.02 for hetero-oligomers. VoroMQA interface energy values were compared without any tolerance.
The new VoroMQA-dark-based model selection protocol, called "VoroMQA-select-new", did not perform better than the top 2 human groups overall. However, our new model selection protocol performed consistently better than the VoroMQA-light-based method we used in CASP13 4 ("VoroMQA-select-old"), as shown in Supplementary Figure S10. Supplementary Table S1. The quality evaluation of our best models for CASP14 targets. Best model is the model that has the largest sum of z-scores for ICS, IPS, lDDT and TMscore.

Supplementary Figures
Supplementary Figure S1. Comparative modeling workflow in CASP14. Methods developed in our laboratory are colored blue.
Supplementary Figure S3. Cumulative z-score (left side) and raw score (right side) values of the models designated as first. Targets were ordered by the maximum achieved ICS score.
Supplementary Figure S4. Examples of model pairs with large differences of lDDT-oligo zscores, but not of absolute lDDT-oligo values. The models on the left (produced by the "Venclovas" group), have lower lDDT-oligo z-score and absolute values than the models on the right (produced by other groups). Despite that, the models on the left feature highly accurate intersubunit interfaces as indicated by high ICS and TM-score values, whereas the models on the right have largely incorrect interfaces (ICS values are close to zero).
Supplementary Figure S5. Cumulative CAD-score-based z-score (left side) and raw score (right side) values of the models designated as first. Targets were ordered by the maximum achieved ICS score.
Supplementary Figure S6. Cumulative combined z-score values of the models designated as first. Targets were ordered by the maximum achieved ICS score.
Supplementary Figure S7. Cumulative combined z-score values of the models with the best scores. Targets were ordered by the maximum achieved ICS score.
Supplementary Figure S8. In our model (colored) of H1036 (grey, PDB: 6VN1), antibody (red) is bound to a different epitope of viral trimer (green). Structure superposition is done by aligning the first chain (green) using TM-align 5 .
Supplementary Figure S9. Models of different quality for two interfaces of T1099 (PDB: 6YGH), based on a target-template HHpred alignment with large gaps in the interface region. Target structures are grey, model structures are in different colors. Structure superposition was done by aligning the first chain (green) using TM-align 5 , sequence alignment was visualised using Jalview 6 .
Supplementary Figure S10. Cumulative z-score values achieved by several VoroMQAbased ranking methods applied to the full set of models from all the groups. Targets were ordered by the maximum achieved ICS score. "VoroMQA-select-new-tol0" is a variation of "VoroMQA-select-new" that does not use tolerances for full-structure scores.