Prediction of homoprotein and heteroprotein complexes by protein docking and template‐based modeling: A CASP‐CAPRI experiment

ABSTRACT We present the results for CAPRI Round 30, the first joint CASP‐CAPRI experiment, which brought together experts from the protein structure prediction and protein–protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact‐sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology‐built subunit models and the smaller pair‐wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323–348. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.


Supplementary Material
-Table S1 -List of assessed interfaces 2 - Table S2 -Prediction results per target 3 The tables list for every target the performance of CAPRI Predictor and Scorer groups, and of CASP Predictor groups. The results of automatic servers (listed in all capital letters) are included in their respective predictor group. Results are listed as the number of submitted models of acceptable quality or better, with the number of higher than acceptable quality models listed after a slash, e.g. '9/8**' indicates that 9 models of acceptable quality or better were submitted, of which 8 are of medium quality. CAPRI groups were allowed to submit 10 models, CASP groups 5. Incorrect models are not listed.
- Table S1 -List of assessed interfaces. The interface is listed by the CAPRI target number with the digit after the dot representing the rank of the interface, determined on the interface area. The symmetry operations and interface areas are provided by the PISA tool. Interface areas are rounded to the nearest 10Å 2 .

DOCK/PIERR T69
Monomers were modeled using LOOPP server as well as by separately selecting a template from PSI-BLAST [7] and generating monomers from Modeller, selecting the model with the best DOPE score in the top 5 models. In both cases, LOOPP and Modeller, the template 1QLW was chosen. The monomers from LOOPP as well as those from Modeller were modeled in two independent docking runs. DOCK/PIERR server was used for docking, as in T68, above. The difference here is that for final reranking, a combination of residue (PIE), atomic (PISA) and hydrogen bond (HB) potentials were used. All these potentials are available here 8 .
9 Of the top 10 submitted models were from the run of LOOPP-modeled monomers on DOCK/PIERR. The tenth model was selected to be similar to the homologous complex, 1QLW, from the run of Modeller-modeled monomers on DOCK/PIERR.
Most of the targets in Round 30 of CAPRI were homodimers and homotetramers, thus it was a good opportunity to test our novel symmetry assembling docking method [1]. To do so, we imposed C 2 symmetry constraints for all the homodimers and we imposed C 4 and D 2 symmetry constraints for all the homotetramers from the target complexes. Below, we present the new fast multi-resolution method for docking both symmetric and non-symmetric protein complexes that was used in Round 30 of CAPRI.
First, the structures of the individual subunits were taken from the stage two predictions of the CASP10 assessment experiment. More precisely, starting from 150 available CASP 3D models of monomers, we predicted models of symmetric multimers using the novel symmetry docking method, which performs symmetry-induced protein docking using the shape-complementarity scoring function computed as spherical polar Fourier correlations [1]. Specifically, this method performs exhaustive search over the available (four in case of cyclic symmetries or six otherwise) degrees of freedom for the given point group symmetry type. For the targets of Round 30 of CAPRI we imposed three types of symmetry, C 2 , C 4 , and D 2 . For the case of heterodimers, we used the standard Hex docking method [2].
For the input of the docking methods, we generated the scaffolds of initial models of monomers by "cuttingoff" the side chains. More specifically, we mutated all side-chains except for the glycines to alanines. Compared to the standard all-atom rigid-body docking methods, we expect the scaffold docking approach to produce binding poses that are less sensitive to the flexibility of the side-chains. We clustered the solutions with the threshold ligand-RMSD value of 8Å using the RigidRMSD library [3]. Finally, we ranked the clusters by the value of the best score and kept 50 best clusters for the refinement stage. In total, for each target we proceeded to the refinement with 7,500 modeled structures of protein complexes.
On the next step, we optimized each putative binding interface of the all-atom representation of a protein complex by means of a rigid-body first-order minimization scheme [4] as implemented in our software package SAMSON 9 . Specifically, after each rigid-body minimization step we proceeded with the optimization of side-chains described by the rotameric representation using the SCWRL4 package [5]. We computed the interactions between the subunits in a protein complex using the novel reference state-free knowledge-based scoring function KSENIA [6], which is smooth by construction and is thus very suitable for a gradient-based minimization protocol. Finally, we ranked the predictions by the value of the KSENIA potential of the optimized structure and selected ten best candidates for the submission.

Prediction of Protein-Protein Interactions by GALAXY in CAPRI Round 30
Recently developed features of the GALAXY protein modeling package including template-based structure prediction, loop modeling, model refinement, and protein-protein docking were used to predict the structures of protein complexes from amino acid sequences in CAPRI Round 30. Template-based complex structure prediction was applied to 22 homomer and 2 heteromer targets, and ab initio docking was carried out for 1 target (T88). In template-based prediction, up to 10 structural templates were first selected by using GalaxyGemini [1] for homomer targets and by using HHsearch [2] for heteromer targets. 3D models for the whole complex were then constructed from the templates by the model-building components of GalaxyTBM [3,4] after deleting predicted signaling peptide segments. Unreliably modeled regions were then detected and re-modeled by using GalaxyLoop [5,6]. The resulting models were further refined using a new version of GalaxyRefine [7] modified for complex structure refinement. Symmetry restraints were applied to homomer targets during the loop modeling and refinement procedures. For ab initio docking of the two subunits of T88, the GalaxyPPDock [8] method that performs global optimization of a new hybrid protein interaction energy function that is composed of physics-based and knowledge-based energy components was employed. The final models obtained by docking were also refined by GalaxyRefine. When the CAPRI criterion was used to evaluate the models for the 16 targets with released PDB structures as of Oct 31, 2014, 5 targets were predicted with medium quality and 4 targets with acceptable quality. Contributions of the interface loops modelled by GalaxyLoop to the fraction of native contact f (nat) turned out to be significant [+0.15 (T69), +0.19 (T80), +0.28 (T85), and +0.17 (T87)]. Refinement by GalaxyRefine improved ligand RMSD (from 6.15Å to 5.96Å), interface RMSD (from 3.17Å to 3.14Å), f (nat) (from 0.50 to 0.54), and MolProbity score (from 3.29 to 2.65) when averaged over the 9 targets predicted with at least acceptable quality. In the scoring round of CAPRI 30, models were scored by GalaxyRefine energy function that consists of molecular mechanics energy terms, generalized born solvation free energy, distance-and orientation-dependent statistical potential, statistical hydrogen bonding energy, etc after local energy minimization. Medium and acceptable quality models were selected for 6 and 3 targets, respectively out of the 16 evaluated targets.

Oligomer model selection using predicted protein interfaces
For the monomers, we used CASP 3D structure predictions stage2 server models (mostly from Zhang-Server or BAKER-ROSETTASEVER), except in a few targets where we also used models from the HADDOCK team (Txx P20). For each target, usually two monomer models were included, but occasionally one or three models were included instead. We then used M-ZDOCK [1] to build oligomers with symmetry constraints for nearly all targets, and ZDOCK [2] and homology modeling for two cases without symmetry. The oligomer models were clustered by a clustering program from HADDOCK [3], with Cα RMSD cutoff at 5Å (or at 10Å if too few clusters with at least 10 members were produced). The clusters starting from different monomer models were compared and combined if overlap occurred. The latter combined clusters were given preference in final selection. From each cluster, two models were retained; one was the cluster center and the other had the best ZDOCK score. All retained models were ranked according to agreement with protein interface predictions by our meta-PPISP server [4]. The final 10 oligomer models were energy minimized by Chiron [5] or AMBER [6] before submission.

Docking the Round 30 CASP/CAPRI Targets Using CASP Consensus Structures and Symmetry-Constrained Polar Fourier Docking Correlations
Many of the targets in Round 30 of CAPRI were predicted by the CASP organisers to be C 2 symmetric homodimers, and were initially presented to the CASP community as fold prediction targets. Because many good models were produced by several CASP predictor groups (i.e. were reported as being within 3Å RMS of the crystal structure by the CASP assessors), we decided to base our docking predictions on only those models. In other words, for each target that we attempted to dock, we downloaded and examined the best "stage2" monomer models from the CASP prediction web site 10 . However, due to time constraints, we attempted to dock only targets T79, T80, T81, T82, T84, T85, T86, T88, T89, and T90.
For each target attempted, we used our "Kpax" protein structure alignment program [1] to select a representative, or "consensus", structure from the provided "stage2" models. We first calculated all-versus-all structure alignments of the CASP models to produce a square matrix of normalised Kpax similarity scores. We then selected the structure from this matrix which had the greatest structural similarity (i.e. row-wise total similarity score) to all other structures, and we treated that structure as a single consensus structure for docking.
In order to check for possible structural homologues, we then used our KBDOCK web server [2] to search for Pfam families which involved the given target domain(s) and which had examples of structural inter-or intra-chain homo-dimers in the PDB. This procedure found candidate templates for T80, T82, T84, T85, T86, and T88, which were then used to orient the target monomers by least-squares superposition. We also used our novel "SAM" (Symmetry Assembler) docking program (manuscript in preparation) to perform fourdimensional FFT docking searches with built-in C 2 symmetry constraints. More specifically, SAM performs a brute-force search correlation search over the available degrees of freedom for the given point group symmetry type. For example, for C 2 symmetry, SAM searches over one translational and three rotational degrees of freedom. SAM uses essentially the same polar FFT code as Hex [3], but with knowledge of the target symmetry embedded directly into the correlation equations. Thus, every candidate docking orientation produced by SAM is guaranteed to have precisely the desired symmetry. The list of solutions produced by SAM were then clustered using a similarity threshold of 6Å RMSD in order to remove near duplicate solutions, and the topscoring members of each cluster were assessed by eye to eliminate solutions which seemed improbable. If one or more possible homology templates were found by KBDOCK these were taken as our first predictions, and the remaining (from 7 to 9) predictions were taken from the SAM clusters. For targets T79, T87, T89 and T90, all 10 predictions were calculated by SAM.

A protocol for Docking Homo-Multimers and Filtering Solutions According to Symmetry Operators
Monomer units in a homo-multimer complex are usually related to each other by one or more axis of symmetry, a homodimer for example by a two fold rotation axis. There are two ways in which symmetry can be accounted for in protein docking, either to restrict the initial search space, and therefore symmetry operators are embedded within the docking algorithm itself, or as a post-docking filter. Here we use the latter methodology.

Methods
For all docking runs, for both the automatic and manual submissions, we used our publicly available 11 flexible docking server, SwarmDock [1,2]. This server does not take account of any potential symmetry between the protein structures to be docked and can only consider two proteins at a time, one receptor and one ligand pair. For the automatic predictions the protein monomers were built with our fully automatic, publicly available 12 protein structure prediction server, 3D-JIGSAW, which employs a genetic algorithm to recombine models built from numerous templates [3]. For our manual submissions, we used the best-input structure from the CASP 150 set, ranked according to our automatic CASP11 quality assessment server, OccuScore; this automatic server calculates DDFIRE [4] and TM scores [5] for each structure, then structures are ordered by DDFIRE values and hierarchical clustered using TM scores (TM threshold of 0.5).
Where close homologs to the complex to be modeled exist, the modeled monomer could be superimposed onto the monomer units of the homologs, thereby potential residue contacts across the symmetry axis of each interface could be obtained. For such cases, the restrained docking mode of the SwarmDock server was used. However, such constraints may be considered to be weak and only partially restrict the full search space. In all other cases full blind docking was performed. In the case of modeling tetramers a two-step docking protocol was performed; the highest ranked dimer was chosen from the first docking run and this solution was subsequently docked against itself to obtain a tetramer. For a trimer, first a receptor ligand pair was docked, solutions scored according to the existence of two monomers being related by a three-fold axis, then the top ranked solution was docked against another copy of the monomer.
The solutions were ranked and clustered as described in [1,2]. In addition, a new filter was created based on the expected symmetry operators relating monomers (C 2 for dimers, C 3 for trimers and C 4 or D 2 for tetramers). Additional information used to choose the best 10 structures were binding energies, size of clusters and distribution of contacts. Interestingly, for manual submissions, inspection of the final ranked list of solutions sometimes indicated that a particularly favorable binding energy or cluster size could override the lack of symmetry between monomer units. For such cases, non-symmetric homo-multimer complexes were submitted.

Molecular modeling of the monomers
Modeling templates were identified by sequence alignment using BLAST [1]. The alignments were manually adjusted by shifting inserts/deletions in secondary structure elements to the ends of such elements or to predicted loop regions. Conservation and similarity of residues in expected dimerization interfaces was taken into consideration in the alignment.

Model building
Homology models were constructed using Modeller [2]. We did not use models from the CASP 3D structure predictions.

Oligomer construction
Modeling of homodimers was based on the dimerization mode of the template provided significant conservation and similarity of interface residues was detected. In some cases several dimerization modes were considered. Models were constructed by superposing the modeled monomers onto the template and energy minimized using Discovery Studio (Dassault Systèmes BIOVIA, Discovery Studio Modeling Environment, San Diego: Dassault Systèmes, 2015).

Docking
In cases when the modeling template was not a dimer we docked the molecules using the geometric-electrostatichydrophobic version of MolFit [3,4,5] followed by a propensity and solvation based post-scan filter [6].

Negi
Surendra S. Negi * Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, 301 University Boulevard, Galveston, TX 77555-0857, USA * E-mail: ssnegi@utmb.edu Since the structural information of the given target proteins in CAPRI round 30 were not available in the protein data bank [1], the accuracy of the modeled protein complexes were dependent on: a) selection of the template structures, and b) model structure of the target protein. In our approach, all the target sequences were submitted to HHPRED web server [2] to find the best homologous template structures. The sequence alignment between the target sequence and the template structure was manually checked to adjust the gap positions. ROBETTA web server [3] and modeller program [4] were used to generate the preliminary model structures for the target proteins T68-T72 and T73-T94, respectively. Further, molecular docking programs HEX [5] and S-DOCK were used to generate an initial set of docked protein complexes. Using default parameters, a set of 100 docked structures were generated and the RMSD values between docked structures were calculated using profit program [6]. The docked structures with RMSD values less than 5Å were clustered using Cytoscape program [7]. Finally, a representative structure from each of the top ten clusters were deposited to the CAPRI website.

Weng
Zhiping Weng * , Thom Vreven, Brian G. Pierce  For the structures of the component proteins, we used the 3D models provided by CASP, specifically those from the Zhang server and the Baker-Rosetta server. We visually inspected the two sets of five structures from these servers, identifying consensus folds, and selected structures with minimally exposed flexible termini that could obscure potential binding sites. Docking was performed using one or two of the structures from these sets.
Our complex structure prediction was started with literature searches for biological information that could be used to restrict the docking search space. We generally found less information versus previous CAPRI rounds. For a few cases we found homologous proteins in the Protein Data Bank that were either listed as a homodimer, or had two copies in the asymmetric unit that could possibly represent a homodimer. In such cases we superimposed the component protein onto the PDB template and included it as one of our ten predictions.
We used our ZDOCK series of programs to generate the complex structures [1,2,3,4,5,6,7]. Homodimers and trimers were docked using our algorithm M-ZDOCK that was designed for symmetric multimers [6] imposing C 2 or C 3 symmetry as appropriate for the given target. Homotetramers were docked using M-ZDOCK, both in a single step with C 4 symmetry and in a two-step procedure as a dimer of dimers for D 2 symmetry. Heterodimers were docked using ZDOCK version 3.0.2 [5,7]. The docking predictions were then pruned using our distance-based clustering algorithm [8,9] and final predictions were selected based on a combination of ZDOCK score, prediction density, and manual inspection. For the scoring rounds we applied the IRAD reranking function [10], and used a combination of IRAD score and manual inspection for selecting the final ten predictions.

Guerois
Jinchao Yu, Françoise Ochsenbein and Raphäel Guerois * Institute for Integrative Biology of the Cell (I2BC), Commissariatà l'Energie Atomique et aux Energies Alternatives (CEA), Centre National de la Recherche Scientifique (CNRS), Université Paris-Saclay, CEA-Saclay, F-91191 Gif-sur-Yvette, France * E-mail: Raphael.Guerois@cea.fr The structures of the individual subunits were obtained from template based modeling using the hits obtained by HHsearch [1] as reference structures. When several templates were available, up to five models were generated from every single template but not from multiple templates. Overall, for 80% of the cases, we used directly BAKER-Rosetta [2] and/or Zhang-server [3] models. When we didn't use the models provided by the CASP servers, swiss-model server [4] was used for sequence identities above 30% and Rosetta3.5 [2] for lower sequence identities. Models generated with Rosetta were generally constrained during the relax step so they did not diverge much from the initial Calpha coordinates. Loops were sampled using the Rosetta kic module [5] with specific constraints to sample conformations of interest. When possible, we used comparative modelling to generate the models of the assemblies. The different orientations observed in the multimeric assembly of different members of a superfamily were used as initial seeds. In 13 out of 25 cases, we also used template free docking using Zdock [6] and Mzdock (for symmetry constraints) [7] to generate rigid-body initial models. Convergence between template-based and template-free approaches strengthened the confidence for a specific orientation which was further sampled. Downstream template-free rigid-body docking, the most likely decoys were identified using a composite score built from InterEvScore [8], Soap pp [9] and Zrank [10]. Rosetta3.5 was used to relax the selected models under symmetry constraints [11]. The fcc program was used for clustering [12].

HADDOCK in CASP-CAPRI ROUND 30
In this CAPRI round, the HADDOCK group generated models for each target using a combination of homology modeling/threading and docking/refinement with the HADDOCK web server [1] (submissions were only done for the server category of CAPRI). Most of our predictions started from a model built with MODELLER (v9.12) [2] using HHpred [3] for template search, or I-TASSER [4], or models generated using both methods. The target/template alignment was optimized, if necessary, either with NEEDLE [5] in case of high sequence identity, or using manually-curated multiple sequence alignments. In one particular case, T77, we created a model of the monomer using a flexible multibody docking approach [6] due to the disposition of the individual domains. Typically, an ensemble of the ten best monomer models was used as starting point for HADDOCK. Since all targets were homo-oligomeric structures, non-crystallography symmetry and regular symmetry restraints were used [6] in combination with center-of-mass restraints (ab initio mode of HADDOCK). In cases where homologous complexes were identified, Ca-Ca distance restraints were applied, derived either manually or determined by PS-HOMPPI [7]. A number of targets were also directly modeled as a multimer in MODELLER and only refined in explicit solvent in HADDOCK. The models for submission were distributed over the clusters ranked by HADDOCK. A summary of the modelling methodology and restraints used is provided in the following supplementary

Docking
Each monomer structure retained was initially docked to itself as a rigid body with webservers ClusPro [5] (symmetry mode, for dimers and trimers) and M-ZDOCK [6] (for quadrumers with C 4 symmetry). Quadrumers with D 2 symmetry were constructed by docking selected first-round C 2 dimers to themselves under C 2 constraints again. Symmetry was thus enforced at the initial stage of rigid-body docking with both servers. All cluster centers retained from the initial stage were further refined using our own refinement approach [7] with MM-PBSA energy models and the consideration of both backbone and side-chain flexibility. In addition, symmetry was maintained in the refinement stage by sampling only the corresponding subspace of rigid-body motions. In other words, the 5D space of rigid-body motions (a Riemannian manifold of S 2 ×SO(3)) after removing the center-to-center distance was again approximated to a Euclidean space with tailored exponential coordinates [8,9] but only a subset of variables based on these exponential coordinates was needed to respect an oligomer's symmetry during sampling. Sloppy modes to which binding energy was insensitive were again removed while optimizing the resulting conformational space [7].

Scoring
The energetically lowest structural sample was chosen as the final, refined model for each starting oligomer structure. All refined models were again ranked based on the increasing order of binding energy [7]. For few targets whose templates exist as oligomer, the corresponding oligomer templates were sometimes used to promote models in the ranked lists if not already top-ranked.

How was the structure of the individual subunits modeled?
We used CASP models for unbound proteins.

Which modeling software was used?
The software package Simulation of Diffusional Association (SDA 15 ) was used to perform protein-protein docking. The selected complexes were further refined using all-atom molecular dynamics in Amber 12.

Was any of the available CASP 3D structure predictions used?
For both targets, we used monomer structures that were released from CASP as starting points. All released models were assessed using the QMEAN server 16 and the three highest scoring were selected. For target 68 these were: QUARK TS1, FALCON EnvFold TS4 and FALCON TOPO TS2. The loop regions of the two termini were cut (first 8 residues of the N-terminus and last 9 residues of the C-terminus). For target 69, they were: Zhang-server TS1, QUARK TS1 and QUARK TS3, with the complete models.
Which docking and scoring methods were applied; were symmetry constraints imposed? Was information on homologs used? SDA performs rigid-body Brownian dynamics simulations using precomputed interaction grids. For the docking simulations described here, we used grids describing the electrostatic, electrostatic desolvation and hydrophobic desolvation interactions between interacting proteins.
The docking method in SDA models bimolecular diffusional association and records predicted encounter complexes when a predefined reaction criterion is satisfied. In this case, the reaction criterion was defined such that the centers of geometry of the interacting proteins are closer than the sum of the distances from the centers of each protein to their furthest surface atoms, plus 10Å. The 2000 highest scoring recorded complexes, as scored by the SDA forcefield, were then clustered into 10 clusters using an average-linkage clustering algorithm. These clusters were then ranked by the number of complexes in the cluster, and the cluster representatives from the top five clusters were chosen for further refinement.
The Amber ff99SB forcefield [1] was used and the solvent environment was described using the modified OBC Generalized-Born model [2]. The complexes were subjected to 1000 steps of minimization, followed by 500 ps of molecular dynamics.
The similarity to the homologous model was used to decide the rank of complexes for submission in the case of both targets.

Vakser
Ivan Anishchenko 1 , Petras J. Kundrotas  The number of experimentally determined protein structures and their complexes accounts only for a fraction of known protein "universe". Thus, structural modeling of protein-protein interactions (docking) largely has to rely on modeled structures of the individual proteins [1]. This round of CASP was important for testing our ability to do that by utilizing docking methods within CAPRI assessment framework.

Methods
For the template-based docking we used the protocol similar to the one developed previously in our lab [2,3]. The procedure performs spatial rearrangement of 3D structures of the two target proteins (treated as rigid bodies) to match either the entire monomers of the co-crystallized complexes (from the full-structure template library) or their interfaces only (from the interface template library). Structural alignment of the proteins was performed by TM-align [4]. The free docking was performed by our GRAMM program implementing the FFT approach [5,6]. To accommodate the structural inaccuracies of the modeled monomers, the program was run at lower resolution.

Results
Monomer models were chosen from 150 server models provided by CASP organizers after visual inspection (to exclude poorly packed structures with long misfolded termini, loops, etc.) and clustering of the models based on their structural similarity. For targets 68-70, top five models generated by the stand-alone I-TASSER package (Yang Zhang lab) were selected. Dimeric complexes were built by full and partial structure alignment techniques using generic and/or specialized template libraries, previously generated in our lab. For targets 85, 89, 92, and 93 additional templates (previously filtered out from our libraries due to structural deficiencies) with sequence similarity to the target proteins (detected by BLAST) were extracted from the PDB. All full and partial structure alignment models were evaluated in terms of the TM-score, fraction of interface residue contacts shared between the target and the template F cont, and relative volume of clashes V. Models satisfying condition (TM-score > 0.6 OR F cont > 0.1) AND (V < 0.05) were clustered based on structural similarity and ten representative models from the most populated clusters were chosen for the final minimization. Tetrameric targets (70, 71, 73, 74, 78, and 81) were built from the dimers by free docking using GRAMM program in the low-resolution mode.

Availability
The docking procedures and libraries used in this round are available from our Dockground resource for protein recognition studies 17 .

Methodologies for constructing and selecting three-dimensional protein complex models for CAPRI round 30
We used the following four steps to predict protein complexes for the CAPRI R30 targets, as in CASP11. i) FORTE series [1], including DELTA-FORTE which is our new profile-profile alignment method empowered by NCBI's Conserved Domain Database (CDD) (unpublished), are performed with the PDB and SCOP libraries for each target to obtain target-template alignments. In addition, we have recently derived a novel amino acid substitution matrix, MIQS [2]. This matrix also uses auxiliaries in searching for templates and sampling alignments. ii) Based on those alignments of top 100 proteins, we built 3D models with MODELLER and MOE (Molecular Operating Environment). For building 3D-models, we employed multiple templates when we were able to use structural information of the same family in PDB. iii) We sorted our models in terms of the structural quality scores calculated by the Verify3D program. As a precise guidance of model selection, we utilized the scores, which were averaged over models based on an alignment, instead of a score for each model. The averaged score is effective to enhance prediction accuracy, especially for easy targets, according to our preliminary results. These procedures are mostly executed as an individual subunit basis. iv) Then we observed oligomeric states of top candidates sorted by their structural quality scores to predict threedimensional protein complex models. In many cases we could see similar tendency of oligomeric states among top candidates for each target. Otherwise, we calculated structural quality scores of protein complex models to select proper one. We did not impose symmetry constraints to build/select protein complex models, with a few exceptions. The goal here was to use docking to model the quaternary structure of the proposed targets. Since the 3D structures of the individual subunits needed to be modelled, we used as input structures for docking a set of models formed by the top five CASP submissions (when available) from the ZHANG, ROSETTA and QUARK servers. For the heterodimer (T89) we used for each subunit only the best model from these servers according to Z-DOPE score. For docking, different protocols were applied to these 15 starting models depending on the oligomeric state of the target. For all dimeric targets, FTDock (with electrostatics and 0.7Å grid resolution) and ZDOCK 2.1 were used to generate 10,000 and 2,000 rigid-body docking poses, respectively, forming homoor hetero-dimers depending on the case. These docking solutions were merged and scored using the pyDock binding energy. No symmetry restraints were used in these targets. For the homotetrameric targets, Symmdock and CombDock were applied to generate symmetric and non-symmetric docking poses, respectively. The total binding energy of all possible interfaces was then calculated using pyDock, and the final models were selected so that the symmetric and non-symmetric solutions were evenly represented. For the dimer of heterodimers (T81), SymmDock was used to generate symmetric homodimeric docking poses for each of the two different subunits, while FTDock and ZDock were applied to obtain heterodimeric docking orientations between the different subunits. All the complexes were finally scored with pyDock. The top 1000 docking poses of each pool were combined together and filtered to remove solutions without spatial compatibility or lacking symmetry between heterodimers. The final docking solutions were scored according to the total complex binding energy calculated on all interfaces using pyDock. For all the targets, redundant predictions were eliminated, and the final 10 selected docking poses were minimized to reduce the number of interatomic clashes using AMBER10 with AMBER parm99 force field. No restraints on potential interface residues based on biological information were used for any of the targets.

Template-based structure modeling of protein-protein interactions by global optimization
We have developed a protein-protein modeling system extended from our prediction platform of CASP protocols. In this CAPRI round, the modeling of monomer subunits is required prior to the construction of complex structures. The model1 structure of the nns server is used as the subunit to be assembled. The nns server applied the global optimization method of conformational space annealing (CSA) to three stages of optimization including multiple sequence -structure alignment, 3D chain building, and side-chain remodeling.
Templates for the complex under consideration were collected from the list of homo-multimeric templates used for the template-based modeling of the subunit. We performed structure alignment using TM-align to calculate the structural similarity between the nns model1 and each component in the complex template. Templates with TM-score > 0.5 were selected and clustered based on the binding interface similarity. The binding interface is a set of residue-residue pairs that have at least one heavy atom pair within the distance of 5.0Å. For each complex template, the interface residue pairs were identified and the set of interface residue pairs were used for constructing a network consisting of nodes of templates and edges among nodes properly weighted with the binding interface similarity. Clustering is performed by applying the modularity optimization software of Mod-CSA to the network. Using all complex templates in a cluster, appropriate inter-and intramolecular atom pair distance information is collected to generate template-derived restraint energy terms. MODELLER is used to generate initial model structures of protein-protein complexes.
To refine complex structures, we applied CSA to optimize the protein-protein interaction potential that consists of stereochemistry energy, DFIRE, torsion-torsion interaction energy, repulsion energy of van der Waals interaction, DCOMPLEX and template-derived restraint energy. For the template-derived restraint energy described above, we used the Lorentzian-function-based penalty term. The inter-molecular portion of the template-derived restraint term and the DCOMPLEX energy were used to adjust the binding mode among subunits in the complex structure, and the other energy components were used to settle the subunit structure during the optimization process to represent the induced fit. We did not use any information of symmetry between asymmetric units.

Prediction of the probable biological units using the ClusPro server
Introduced in 2004, ClusPro was the first fully automated web based server for protein-protein docking. The server performs rigid body docking using the PIPER program and clusters the lowest energy structures. The models are ranked according to cluster size. In order to deliver results to the user within 24 hours of submission, the current implementation of ClusPro does not include refinement beyond minimizing the energy of structures to remove steric overlaps. In spite of this limitation, the server has almost 4800 registered users, and runs about 4000 jobs each month.
In the latest round of CASP-CAPRI experiment we have applied ClusPro for the structure prediction of the probable biological units for the "easy" CASP targets.

Model preparation
Since ClusPro requires three dimensional structure as an input, we have built a "consensus" model for each target using the 150 server models provided by the CASP management committee. For each "easy" target most models had the same fold, with variations in loops and tails. Removal of the uncertain regions resulted in reliable "consensus" models that were used for docking.

Docking
Our docking approach consists of two steps. The first step is running PIPER -docking program that performs systematic search of complex conformations on a grid using the fast Fourier transform (FFT) correlation approach. The scoring function includes van der Waals interaction energy, an electrostatic energy term, and desolvation contributions calculated by a pairwise potential.
The second step of the algorithm is clustering the top 1000 structures generated by PIPER using pairwise RMSD as the distance measure. The radius used in clustering is defined in terms of Cα interface RMSD. For each docked conformation we select the residues of the ligand that have any atom within 10Å of any receptor atom, and calculate the Cα RMSD for these residues from the same residues in all other 999 ligands. Thus, clustering 1000 docked conformations involves computing a 1000 × 1000 matrix of pairwise Cα RMSD values. Based on the number of structures that a ligand has within a (default) cluster radius of 9Å RMSD, we select the largest cluster and rank its cluster center as number one. The members of this cluster are removed from the matrix, and we select the next largest cluster and rank its center as number two, and so on. After clustering with this hierarchical approach, the ranked complexes are subjected to a straightforward (300 step and fixed backbone) van der Waals minimization using the CHARMM potential to remove potential side chain clashes. Unless requested otherwise by the user, ClusPro outputs the centers of the 10 largest clusters, which were submitted as predictions.

Protein-Protein Docking Using MIAX in the joins CASP-11 -CAPRI Experiment
For prediction of the interaction targets in CASP-11 our group used mainly the system MIAX [1,2,3,4,5] for docking unbound structures. The structures used in the docking process were namely those that were predicted by our own group.
MIAX (Macromolecular Interaction Assessment Computer System) is a combination of several Methodologies to dock rigid protein structures but with flexible and softening techniques to mimic real induced fit processes during the docking operation. MIAX outputs several thousands of decoys that are selected based upon a priori recognition of potential active sites on the surface of the proteins. We have developed a systematic methodology to recognize putative binding sites on the protein surfaces based on quantification of the hydrophobic potential. The system has been benchmarked on hundreds of PDB protein complexes, and the reliability of the predictions is more than 70%.
Since the structures to dock in the case of CASP-11 are deemed to be of low resolution, the methodology to soften the surfaces in MIAX is very adequate to average positions that may not exactly be those of the crystal structure.
Since the number of decoys after the first docking process in MIAX exceeds the 4000 structures, these are clustered and those clusters holding close-to-predicted binding areas are ranked high in this initial step.
To refine the final complex structures we have used namd2 molecular dynamics program to equilibrate and minimize the energy of the structures. Our group submitted candidates for all the CASP-11 targets.

Selection of Structures
For each consensus sequence, we identified the set of candidate structures with the ten lowest scores, ten lowest interface scores, and ten lowest elliptic scores. In some cases, we manually added more low-scoring structures distinct from the automated pool. We then manually filtered the resultant pool, generally comprising twenty to fifty unique structures. We removed any candidate complexes that had poorly folded secondary structure, very small interfaces, or asymmetry (for homomultimeric complexes) from the pool. We clustered by eye the remaining candidate structures, and for submission we selected ten structures from unique clusters, with the lowest-scoring structures by each scoring method represented by two to five structures each. If we did not find ten distinct clusters, we supplemented the submission set with low-scoring candidate structures from duplicate clusters.

Oliva
Edrisse Chermak 1 , Luigi Cavallo 1 and Romina Oliva 2, * 1 King Abdullah University of Science and Technology, Saudi Arabia 2 University of Naples "Parthenope", Italy * E-mail: romina.oliva@uniparthenope.it We submitted scoring predictions for 22 out of the 25 assessed targets. For each target, CONSRANK [1,2] was run on the ensemble of uploaders' models, after they had been edited to assign to corresponding amino acids the same chain identifier and number. The automatic renumbering tool we developed to the aim first extracts the FASTA sequences from the PDB files, then uses BLASTclust [3] to distinguish the different chains and aligns sequences within each cluster with ClustalW [4]; finally, it rewrites the PDB files to have the sequences consistently renumbered. The ten models top ranked by CONSRANK and not showing a number of clashes above the CAPRI-defined threshold were selected for submission to the scoring session. CONSRANK is a consensus-based algorithm, which ranks models based on their ability to match the most conserved contacts in the ensemble they belong to. Specifically, given an ensemble of N models of the same complex, for each inter-residue contact the conservation rate, CR kl is defined as: where nc kl is the total number of models where residues k and l are in contact. The conservation rate thus ranges between CR kl = 0, if the contact between residues k and l is never observed, and CR kl = 1, if the contact is observed in all the models. Once the conservation rates have been calculated, for each model i a score is first calculated as: where M i is the total number of contacts in model i. Then, a normalized score,Ś i , is calculated as: Models are ranked according to theirŚ i value. Two residues are defined in contact if any pair of atoms belonging to the two residues is closer than a cut-off distance of 5Å. The stability of the method to cut-off distances in the range 4 to 10Å has been previously demonstrated [5]. The structures of individual subunits were modeled with I-TASSER modeling package v.2 [1], installed on a local compute cluster. The models were inspected in PyMOL v.1.7 together with aligned template structures and trimmed where deemed necessary. None of the available CASP 3D structure predictions were used. GRAMM-X software [2] and Web server [3] were used for docking. Symmetry constraints as implemented in GRAMM-X were used for homo-polymer targets. Briefly, every low-energy candidate space transformation coming out of the DFFT grid search is checked that it is close to be symmetrical within a certain cutoff, and, having passed that check, the candidate model is subject to the GRAMM-X off-grid semi-local refinement step under symmetry constraints.

Tovchigrechko & GRAMM-X
Information on available homologs of the target complexes was not used.

GRAMM-X
GRAMM-X is a collaboration between Andrey Tovchigrechko and Ilya Vakser. It is available at: http://vakser.compbio.ku.edu/resources/gramm/grammx Select best predicted interface Evaluation protocol for higher order oligomers Figure S1 -Evaluation protocol for higher-order oligomers. We evaluate the prediction quality of each interface of the higher-order oligomers individually. In order to do so, we extract all interfaces in the prediction and assess these against the target interfaces. For a tetrameric model as depicted here, consisting of chains A, B, C and D, we extract the dimeric pairs AB, AC, AD, BC, BD and CD. An N-model submission then effectively becomes a 6N-model submission, giving rise to 6N sets of assessment quantities f (nat), L-rms and I-rms. Of each 6 pairs corresponding to the original tetrameric submission, the best assessment quantities are selected and the original submission is annotated with these. First the highest f (nat) is selected; if these are equal, the lowest L-rms is taken, followed by lowest I-rms if necessary.  Table 3 and the text for details). The plotted GDT-TS values are for predicted complexes for which the LGA-S and GDT-TS scores differ by less than 15 units (about 80% of the submitted models), to ascertain that the corresponding subunits had the correct residues numbering. Given that by and large the Round 30 CAPRI targets represented proteins that could be easily modeled by homology, the chances that models with high values for both GDT-TS and LGA-S scores correspond to different structural alignments is extremely low.