Coarse‐grained and atomic resolution biomolecular docking with the ATTRACT approach

The ATTRACT protein‐protein docking program has been employed to predict protein‐protein complex structures in CAPRI rounds 38‐45. For 11 out of 16 targets acceptable or better quality solutions have been submitted (~70%). It includes also several cases of peptide‐protein docking and the successful prediction of the geometry of carbohydrate‐protein interactions. The option of combining rigid body minimization and simultaneous optimization in collective degrees of freedom based on elastic network modes was employed and systematically evaluated. Application to a large benchmark set indicates a modest improvement in docking performance compared to rigid docking. Possible further improvements of the docking approach in particular at the scoring and the flexible refinement steps are discussed.


| INTRODUCTION
The interaction of proteins to form functional complexes is a fundamental property of all living systems. A major goal of structural biology is to obtain structural insight into all relevant protein-protein complexes of living systems. Although the structures of many proteinprotein complexes have been determined experimentally there is still a demand for docking methods to predict entirely new complex structures. For many cases, it is possible to find a homologous complex in the database of known complexes to generate a template-based model. 1 However, in particular for transient protein-protein interactions or low-affinity complexes it is still difficult to determine the complex structure experimentally or based on a homologous template.
Within the blind evaluation of different methods and protocols for protein-protein docking by the CAPRI (Critical Assessment of Predicted Interaction) challenge, [2][3][4][5] we employed the ATTRACT protein-protein docking approach. [6][7][8][9][10][11] In ATTRACT, an atomistic or coarse-grained (CG) protein model can be employed for predicting the targets of CAPRI rounds 38-45. The ATTRACT CG protein model represents each amino acid of a protein by up to four pseudo centers and is intermediate between a residue-level and full atomistic description. 11,12 In contrast to most protein-protein docking methods, the ATTRACT approach includes conformational flexibility of binding partners already approximately during the early systematic stage of protein-protein docking. To speed up the docking calculations, the potential energy can be precalculated on a grid and interactions are then calculated by interpolation from nearest grid points. 6,7 A docking search consists of several Monte Carlo simulations or energy minimizations starting from thousands of start configurations. It is possible to perform docking with an arbitrary number of partner molecules.
The standard reduced CG representation provides a smooth protein surface representation containing a limited number of docking energy minima that allows for more rapid and fully converged energy minimization compared to an atomistic model. Besides side chain flexibility and representation of partners by an ensemble of conformations, it is possible to include global flexibility (eg, domain-domain motion) by energy minimization along the directions of precalculated soft normal modes (simultaneous to minimization in rotational and translational degrees of freedom) for each partner structure. 10,13,14 In recent years, the possibility for global peptide-protein docking, 9,15,16 protein-DNA, 17 or protein-RNA docking 18,19 was implemented. In addition, experimental or bioinformatics data can be included directly during the search as restraints or to guide/bias the search. Thus, it is possible to account for low-resolution CryoEM density 20 or small angle Xrayscattering data. 21 In CAPRI rounds 38-45, we provided docking predictions for all targets and tested in several cases the inclusion of global flexibility during the docking searches. The possible benefit of a systematic docking in translational and rotational degrees of freedom and simultaneous optimization in precalculated collective degrees of freedom was evaluated. Furthermore, refinement efforts at atomic resolution to improve CAPRI protein-protein docking target predictions will also be reported.

| Protein-protein docking approach
The ATTRACT coarse-grained protein model represents the main chain by two pseudo atoms per residue (located at the backbone nitrogen and backbone oxygen atoms, respectively). The side chains of small amino acids (Ala, Asp, Asn, Cys, Ile, Leu, Pro, Ser, Thr, and Val) are represented by one pseudo atom corresponding to the geometric mean of side chain heavy atoms. 11,12 Larger and more flexible side chains are represented by two pseudo atoms to better account for the shape and dual chemical character of some side chains. Effective interactions between pseudo-atoms are described by soft distance (r ij )-dependent Lennard-Jones (LJ)-type potentials of the following form, 12 r ij in case of attractive pair repulsive pair : where R AB and ε AB are effective pairwise radii and attractive or repulsive Lennard-Jones parameters. At the distance r min between two pseudo atoms the standard LJ-potential has the energy e min . However, in case of the repulsive pair (LJ-type) potential (lower two lines in the above equation), the energy minimum is replaced by a saddle point and the potential is positive for all distances (always repulsive). Which pseudo atom pairs are attractive or repulsive has been determined by an optimization procedure based on native complexes and many decoy structures (details given in Reference 12. A Coulomb type term accounts for electrostatic interactions between real charges (Lys, Arg, Glu, and Asp) damped by a distance-dependent dielectric constant (ε = 15r, with r measured in Å). For protein-peptide docking, we used the pepATTRACT approach. 15 The pepATTRACT approach uses the same CG model for an initial systematic search for putative peptide binding regions and the iATTRACT approach for refinement (see below).

| Inclusion of global flexibility during docking using normal modes
In the ATTRACT docking program, it is possible to perform energy minimization not only in rigid body degrees of freedom but simultaneously in any combination of (Cartesian) collective degrees of freedom. 14

| Refinement of docked protein-protein or peptide-protein complexes at atomic resolution
The iATTRACT-flexible interface refinement is one standard refinement technique of docked complexes in ATTRACT. 24 It combines rigid body motions of partners directly with full atomic resolution flexibility of the predicted interface regions. Briefly, the docked protein (or peptide) partners are converted into an all-atom model with the ATTRACT tool aareduce based on the OPLS force field. 25 Missing hydrogens are generated by PDB2PQR 26 and protonation states were determined by PropKa. 27 The atomistic refinement uses a physical force field based on the OPLS parameters 25 to calculate intermolecular nonbonded and electrostatic interactions between the protein partners. Contacts from the input structure are treated as flexible during a simultaneous potential energy minimization in rigid body degrees of freedom and interface flexibility. A structure-based force field is determined on-the-fly (for each complex) to evaluate intraprotein interactions for the flexible interface part. 24 The AMBER16 molecular dynamics (MD) package 28

| Molecular dynamics refinement of docked protein-carbohydrate complexes
The structures obtained by molecular docking were further analyzed by all-atomic MD simulations using Amber16. Periodic boundary conditions in a truncated octahedron TIP3P water box with at least 6 Å distance from the solute to the periodic box border were used.
Counter-ions were used to neutralize the system. ff99SB force field parameters for protein and the GLYCAM06 for oligosaccharides were applied, respectively. Prior to an MD production run, two energyminimization steps were performed: first, 0.5 × 10 3 steepest descent cycles and 10 3 conjugate gradient cycles with harmonic force restraints on solute (10 kcalÁmol −1 Å −2 ), then, 3 × 10 3 steepest descent cycles and 3 × 10 3 conjugate gradient cycles without constraints. After the minimization, a system was heated up to 300 K for 3 | RESULTS AND DISCUSSION

| Inclusion of ENM-based normal modes during protein-protein docking
The idea of using conformational relaxation of partner structures in soft normal modes to account for partner flexibility during docking dates back to Zacharias and Sklenar. 41 In particular, in the field of protein-protein docking, the methodology has been further developed in several studies 13,14,42 and implemented in docking approaches such as ATTRACT 14 and SwarmDock. 42 In the ATTRACT approach, it is possible to simultaneously energy-minimize partner structures in rotational, translational, and collective degrees of freedom during docking. with known unbound and bound structures (excluding complexes with >1000 residues, see Table S1), we first checked how well a subset of calculated softest ENM modes overlaps with the observed difference in protein backbone of unbound vs bound structures ( Figure S1). Using just the first five modes results in many cases in overlaps of 20% to 75%. is only modest it was included in several of the Capri targets.

| Capri targets and predictions
In the Capri rounds 38-45, predictions were submitted for targets 121-136. The targets included protein-protein but also proteinpeptide and protein-carbohydrate complexes. Our prediction results

.0 incorrect
Notes: IRMSD is the root mean square deviation between prediction and experiment of protein backbone atoms within 10 Å of the protein-protein interface. Models not within the top 10 submitted complexes are given in parenthesis. For a definition of acceptable (*), medium (**), and high (***) quality solutions see Reference 5. protein with the extracellular domain of its inhibitor NKR-P1. Initial complex structures for this target were generated based on a template and based on symmetry. The generated initial models were energy minimized using ATTRACT and subsequently further refined at atomic resolution following the protocol described in section 2. We were able to predict two of the interacting protein surfaces in the complex (out of four possible) with several medium (**) and high quality (***) solutions, respectively (Table 1). Similar to all other predictors, we were unable to correctly predict the other two interfaces correctly.

| Round 41 (targets 126-130)
The targets of round 41 consisted entirely of protein domains in complex with carbohydrate chains. Although we designed recently a coarsegrained ATTRACT force field for protein-carbohydrate docking at the time of the round 41 challenge no such force field was available. In case of the AbnB protein (1,5-alpha-arabinanase catalytic domain), the structure of the 1,5-alpha-arabinanase catalytic mutant (AbnBE201A) complexed to arabinotriose (pdb3d5z, res. 1.9 Å) served as protein receptor template. The structure of the protein was energy-minimized in the absence and in the presence of arabinotriose using the Amber16 package (see section 2). All-atom RMSD between the resulting two structures was 1.07 Å, which predominantly accounted for the residues side chains involved in the arabinotriose binding. Both structures were considered for molecular docking simulation allowing for conformational variance in the oligosaccharide binding site following the approach described in section 2. A Ca2+ ion was included as a part of the receptor structure.
For the second receptor protein, AbnE, no experimental structure was available, and therefore we modeled its structure based on three homologous structures (pdb1eu8, res. 1.9 Å; pdb5ci5, 1.61 Å; pdb5f7v, 1.4 Å) using MODELLER. 48 Percent identity between all the sequences including the target protein was in the range of 20% to 30%. Similar to the AbnB protein target, two structures were selected for molecular docking simulation based on the best MODELLER scores. With our docking protocol, we achieved acceptable solutions for all targets and even medium quality for the target 128 case. Since the approximate binding geometry was in all cases correctly identified, further improvement is possible by an optimization of the force field and refinement procedure.

| Round 42 (targets 131, 132)
Targets 131 and 132 corresponded to the cell adhesion proteins HopQ type I and HopQ type II in complex with the human cell adhesion protein CEACAM1. 49 For the CEACAM1 and the HopQ type I unbound structures, pdb2gk2 and pdb5lp2, respectively, were available. However, to create a suitable structure for docking missing segments in the unbound HopQ I were generated using template based modeling with MODELLER. 48 The HopQ type II partner structure was generated using template-based modeling with pdb5lp2 as template.
Unfortunately, the template-based modeling of HopQ II and of missing loop segments in case HopQ I generated model structures that deviated from the bound structures especially at the protein interface region ( Figure 3). During systematic docking, the softest normal mode of each partner was included as flexible degree of freedom. In both target 131 and target 132 cases, no acceptable (or better) docking solution were among the top 10 models that we submitted. However, inspection of all 100 submitted models indicated several very good (medium quality) predictions (Table 1). These docking solutions were found as putative geometries during docking including normal mode flexibility with reasonable score because the normal mode flexibility allowed structural relaxation at the interface (illustrated in Figure 3B).
Upon atomic resolution refinement, the solutions still showed some sterical clashes and were therefore not included in the top 10 submitted models ( Figure 3C). As described in section 2, our standard atomistic refinement allowed only limited protein backbone motions and hence in some case no full relaxation of residual strain.  53 We used pdb2vyc (an arginine decarboxylase) 53 as template to generate starting homology models. These were used as start structures for rigid body minimization using ATTRACT and subsequent atomistic refinement using our standard atomistic refinement protocol (see section 2). At least for the interfaces 1 and 2 but not for interface 2 (Table 1), we obtained acceptable quality for all top 10 models.

| CONCLUSION AND OUTLOOK
In the course of Capri rounds 38-45 with 16 targets, we were able to provide docking solutions of acceptable or better quality for 11 cases within the top 10 solutions. For three targets, our predictions were of high quality. It indicates the high quality of our docking approaches and proves its versatility in dealing with a variety of different targets ranging from large protein-protein complexes, peptide-protein complexes to association of carbohydrates with protein receptors. The failure in some CAPRI docking cases can be likely attributed to inaccuracies in homology modeling of proteins partners used for docking and to significant conformational changes upon association but is also due to limitations of the scoring scheme. The ATTRACT docking approach includes the standard option to account for conformational changes during docking using energy minimization in precalculated soft harmonic modes of each partner based on an ENM. The CAPRI challenge has also triggered our efforts to systematically test the benefit of including such normal harmonic modes as additional variables on top of the rigid degrees of freedom. For one Capri round, the inclusion allowed us to identify medium quality solutions among the top 100 (but not among top 10). The systematic application to a large benchmark set yielded an overall modest improvement in terms of cases where at least an acceptable (or better) solution was found compared to rigid docking (~12% more cases upon inclusion of just the softest mode for each partner). It indicates that indeed in several cases near-native docking solutions significantly benefit but in several other cases incorrect docking solutions benefit even more than near-native solutions from inclusion of global harmonic modes. Application of more advanced normal mode methods such as the nonlinear rigid block normal mode method 54 might be helpful to improve the performance of the flexible docking approach. Furthermore, improvements of the scoring function or the combination with alternative scoring schemes may also enhance the flexible docking approach and will be the subject of future studies.

ACKNOWLEDGMENTS
We thank the organizers of the CAPRI challenge for this opportunity and the assessors for the hard work to evaluate the predictions. We thank all structural biologists who contributed target structures for the CAPRI