Development of low-resolution models for proteins is essential in protein structure prediction and protein design. To achieve this goal, we need an effective interaction potential. Starting with the seminal work of Tanaka and Scheraga (1976), there has been a growing interest in obtaining reasonably accurate force fields. The accelerated growth of structural data on thousands of proteins in the Protein Data Bank (PDB; Berman et al. 2000) has been a source for obtaining residue–residue interaction potentials (Miyazawa and Jernigan 1985, 1999b; Godzik et al. 1995; Bahar and Jernigan 1996; Lee et al. 1999). Tanaka and Scheraga (1976) proposed that the frequencies of side-chain pairing can be used to determine potential interaction parameters. Since then, with the exception of a few studies (Sippl 1990), most of the knowledge-based potentials have been obtained solely in terms of residue–residue contacts (Skolnick et al. 1997, 2000; Miyazawa and Jernigan 1999b; Tobi et al. 2000).

Sippl (1990) used a statistical method, known as the Boltzmann device, to obtain the distance-dependent mean force potentials. This method relies on the assumption that the known X-ray or NMR-resolved protein structures represent classical equilibrium states and, therefore, the distribution of distances between two side chains should correspond to the equilibrium Boltzmann distribution. Other structural parameters, such as dihedral angles, can also be used in this treatment. The statistical potentials developed using this approach and other methods (Tobi et al. 2000; Tobi and Elber 2000; Meller et al. 2002) have also focused only on extracting and analyzing probability density functions dependent solely on distance.

Previous studies have shown that the relative orientation and packing of side chains in proteins is an important determinant of the local (secondary structure) geometry as well as three-dimensional (tertiary structure) topology (Bahar and Jernigan 1996; Bagci et al. 2002a,b). By analyzing various families of protein structures, we had previously shown that certain orientational order parameters are prominent (Buchete et al. 2003). From a quantitative analysis of the statistical data on orientational distributions of side chains extracted from PDB structures, we determined orientation-dependent interactions. This approach is supported by the results of Bahar and Jernigan (1996), who proposed a backbone-dependent coordination system for studying the statistical distribution of residue–residue interactions in proteins. They showed that residue-specific coordination loci and packing characteristics can be extracted by statistical analysis of protein structures, and knowledge-based orientational potentials can be constructed. However, using the backbone-dependent coordination system employed by Bahar and Jernigan (1996), it is difficult to include variations in the sizes of side chains and their rotational degrees of freedom. These effects are important, especially for large side chains that have several probable rotameric states with respect to the backbone. Therefore, we use a coordination system based on side-chain local reference frames (LRFs, see Materials and Methods) that are backbone independent for most amino acids. On the basis of earlier estimations (Buchete et al. 2003) and on statistical corrections for data sparsity, we can also use smaller orientational bin sizes then used by Bahar and Jernigan (1996). In this study, we present a new method for extracting coarse-grained distance- and orientation-dependent residue–residue statistical potentials. We first show that in many protein structures, there is a substantial number of contacts between side chains and backbone. The near globularity of protein structures (i.e., they are nearly maximally compact) implies that such contacts should be prevalent. The necessity to include explicitly the backbone interactions is also supported by the results of previous statistical derivations of backbone potentials that used virtual bond and torsion angles (Bahar et al. 1997) and secondary structure information (Miyazawa and Jernigan 1999a). To capture the presence of the large number of side-chain backbone contacts in protein structures, we include an extra anisotropic backbone interaction center located at the peptide bond. Spherical harmonic analysis (SHA) and spherical harmonic synthesis (SHS) methods express the orientation-dependent potentials in a continuous manner for short-, medium-, and long-range interactions. Multiple decoy sets (Samudrala and Levitt 2000) are used to assess the effectiveness of these potentials in recognizing the native folded states. The results show that the newly derived orientation-dependent potentials for side chains and the protein backbone are successful in a vast majority of cases.