The solution of the protein folding problem requires an accurate potential that describes the interactions among different amino acid residues. The potential that would yield a complete understanding of the folding phenomena should be derived from the laws of physics. However, the use of such physical-based potentials (Brooks et al. 1983; Weiner et al. 1986; Jorgensen et al. 1996; Scott et al.1999) for ab initio folding studies is limited by available computing power (Duan and Kollman 1998). Their applications to the recognition of native structures from nonnative conformations (Moult 1997; Hao and Scheraga 1998; Lazaridis and Karplus 2000; Petrey and Honig 2000; Wallqvist et al. 2002), however, yielded results comparable to knowledge-based statistical potentials that extract interactions directly from known protein structures (Tanaka and Scheraga 1976). Knowledge-based statistical potentials are attractive because they are simple and easy to use. Knowledge-based potentials can be categorized into distance-independent contact energies (Miyazawa and Jernigan 1985; DeBolt and Skolnick 1996; Zhang et al. 1997; Skolnick et al. 2000) and distance-dependent potentials (Hendlich et al. 1990; Sippl 1990; Jones et al. 1992; Samudrala and Moult 1998; Lu and Skolnick 2001). Both residue level (Miyazawa and Jernigan 1985; Hendlich et al. 1990; Sippl 1990; Jones et al. 1992) and atomic level (DeBolt and Skolnick 1996; Zhang et al. 1997; Samudrala and Moult 1998; Lu and Skolnick 2001) potentials were developed and applied to fold recognition and assessment (Hendlich et al. 1990; Sippl 1990; Casari and Sippl 1992; Jones et al. 1992; Bryant and Lawrence 1993; Samudrala and Moult 1998; Miyazawa and Jernigan 1999; Lu and Skolnick 2001; Melo et al. 2002), structure predictions (Sun 1993; Simons et al. 1997; Skolnick et al. 1997; Lee et al. 1999; Tobi and Elber 2000; Vendruscolo et al. 2000; Pillardy et al. 2001), and validations (Luthy et al. 1992; Sippl 1993; MacArthur et al. 1994; Rojnuckarin and Subramaniam 1999), docking and binding (Pellegrini et al. 1995; Wallqvist et al. 1995; Zhang et al. 1997), and mutation-induced changes in stability (Gilis and Rooman 1996Gilis and Rooman 1997; Zhang et al. 1997).

This work focuses on distance-dependent, residue-specific, all-atom, knowledge-based potentials. This is because in protein–structure selections, all-atom–based potentials perform better than residue-based potentials (Samudrala and Moult 1998; Lu and Skolnick 2001), and distance-dependent potentials better than distance-independent ones (Melo et al. 2002). The derivation of a distance-dependent, pairwise, statistical potential *u*(*i,j,r*) starts from a common equation given by

where *R* is the gas constant, *T* is the temperature, *N*_{obs}(*i,j,r*) is the observed number of atomic pairs (*i,j*) within a distance shell *r* − Δ*r*/2 to *r* + Δ*r*/2 in a database of folded structures, and *N*_{exp}(*i,j,r*) is the expected number of atomic pairs (*i,j*) in the same distance shell if there were no interactions between atoms (the reference state). Clearly, the method used to calculate *N*_{exp}(*i,j,r*) is what makes one potential differ from another because the method to calculate *N*_{obs}(*i,j,r*) is the same (except minor differences in database and bin procedures). Samudrala and Moult (1998) used a conditional probability function

where *N*_{obs}(*r*) ≡ ∑_{i,j}*N*_{obs}(*i,j,r*), *N*_{obs}(*i,j*) ≡ ∑_{r}*N*_{obs}(*i,j,r*) and *N*_{total} ≡ ∑_{i,j,r}*N*_{obs}(*i,j,r*). Lu and Skolnick (2001) employed a quasi-chemical approximation:

where χ_{k} is the mole fraction of atom type *k*. The common approximation made by the above two potentials is that ∑_{i,j}*N*_{exp}(*i,j,r*) ≡ *N*_{obs}(*r*). This approximation has its origin in the “uniform density” reference state used by Sippl (1990) to derive the residue-based, distance-dependent potential. In this approximation, the total number of pairs in any given distance shell for a reference state is the same as that for folded proteins. In other words, the distance dependence of the pair probability distribution of the reference state is an averaged distribution over all residue or atomic pairs. This reference state is a noninteracting ideal-gas reference state only if the average interaction of all residue or atomic pairs is zero (i.e., attractive and repulsive interactions cancel each other). However, it is highly unlikely that attractive and repulsive interactions could cancel each other exactly. These missing residual interactions may well be important for an accurate potential.

To explore the missing residual interactions, we establish a noninteracting reference state without using the above-mentioned assumption. This is done by using uniformly distributed noninteracting points in finite spheres. The reference state coupled with a simple distance scaling method is employed to derive an all-atom potential of mean force from 1011 known protein structures (Hobohm et al. 1992). It is shown that the new atomic potential is slightly more attractive than other knowledge-based all-atom potentials (Samudrala and Moult 1998; Lu and Skolnick 2001). This small residual interaction leads to an improved potential of mean force for structure selections from single and multiple decoy sets and for the prediction of the changes in the stabilities of 895 mutants.