Formally, the MDGP can be defined as the problem of finding Cartesian coordinates x1, …, xn∈ of the atoms of a molecule such that for all (i, j)∈S,
where S is the set of pairs of atoms (i, j) whose Euclidean distances dij are known. If all distances are given, the problem can be solved in linear time (Dong and Wu, 2002). If there is an order on the atoms such that the given distances form cliques on each set of five contiguous atoms, the problem is polynomially solvable (Eren et al., 2004). In general, however, the problem is NP-hard (Saxe, 1979).
Note that, as stated above, the MDGP bears no connection whatsoever with molecules. In fact, the MDGP appears in such diverse application fields as 3D graph drawing (Cruz and Twarog, 1996) and network design (Eren et al., 2004). Our assumption that all distances between atoms separated by one, two, and three covalent bonds are known can be expressed as an additional condition on the set S of distances, namely that S can be partitioned into two disjoint sets E, F of distances where
We also assume that for all pairs of atoms in F, the distances are shorter than a given cut-off value Δ (usually this is taken to be 5 Å using, for example, NMR analysis; Creighton, 1993; Schlick, 2002), that is, dijΔ∀(i, j)∈F.
2.1. Discrete formulation
Consider a molecule as being a sequence of n atoms with Cartesian coordinates given by and such that there is a covalent bond between every pair of atoms (i, i+1), for i=1, …, n−1. The bond length ri is the Euclidean distance between atoms i−1 and i (i.e. ri=di−1, i for all i=2, …, n). The bond angle θi∈[0, π] is the angle between the segments joining atoms i−2, i−1 and i−1, i (for all i=3, …, n). The torsion angle ωi∈[0, 2π] is the angle between the normals through the planes defined by the atoms i−3, i−2, i−1 and i−2, i−1, i (for all i=4, …, n) (see Fig. 1).
In most molecular conformation calculations, all covalent bond lengths and bond angles are assumed to be known a priori (Phillips et al., 1996). Thus, the first three atoms in the sequence can be fixed and the fourth atom is determined by the torsion angle ω4. The fifth atom can be determined by the torsion angles ω4 and ω5, and so on. So, given all bond lengths r2, r3, …, rn, bond angles θ3, θ4, …, θn, and torsion angles ω4, ω5, …, ωn of a molecule with n atoms, the Cartesian coordinates xi=(xi1, xi2, xi3) for each atom i in the molecule can be obtained using the following formulae (Phillips et al., 1996):
for i=4, …, n. We call Bi the torsion matrices and denote by the cumulative torsion matrices. For every four consecutive atoms xi, xi+1, xi+2, xi+3 we can express the cosine of the torsion angle ωi+3 in terms of the distances ri+1, di+1,i+3, di,i+3 and the bond angle θi+2, θi+3 by using the cosine law for torsion angles (Pogorelov, 1987, p. 278), as follows:
Hence, if we know all the bond lengths (ri), bond angles (θi), and distances between atoms separated by three covalent bonds (di, i+3), we can calculate the cosine of the torsion angles defined by the atoms i, i+1, i+2, i+3 for i=1, …, n−3. We note in passing that in order for (4) to hold, we obviously need the denominator to be nonzero.
Using the bond lengths r2, r3 and the bond angle θ3, we can determine the torsion matrices B2 and B3 and obtain
fixing the first three atoms of the molecule. Because we also know the distance d14, by (4) we can obtain the value cosω4. Thus, the sine of the torsion angle ω4 can have only two possible values: . Consequently, we obtain only two possible positions for the fourth atom:
along with the respective torsion matrices such that
where C3 is a cumulative torsion matrix. This dichotomy, shown pictorially in Fig. 2, is the basic reason why this problem can be formulated combinatorially.
Figure 2. Discretization of the problem. The atom i+3 can only be in the two shown positions in order to be feasible with the distance di, i+3.
Download figure to PowerPoint
For the fifth atom, we will obtain four possible positions, one for each combination of and . By an easy induction argument, we can see that for the ith atom we obtain 2i−3 possible positions. Hence, for a molecule shaped as a sequence (a linear chain) of n atoms, we get 2n−3 possible sequences of torsion angles ω4, ω5, … ,ωn, each defining a different tridimensional structure. Using the matrices Bi defined above, we can convert a sequence of torsion angles into Cartesian coordinates . Thus, this problem has a finite search space. To test a candidate solution we simply use the function f defined in (1); the candidate solution (x1, …, xn) will be a valid solution if and only if f(x1, … ,xn)=0.
The discussion above can be summarized in the following theorem.
Theorem 2.1.Consider a sequence M of n atoms such that:
Then there is a finite number of distinct immersions p:Msuch that:
p(1)=(0, 0, 0), p(2)1=0, p(2)2=0, p(3)1=0 (where p(i)kis the kth coordinate of p(i) for k3, in);
for all atoms i, j with known atomic distance dijwe have
We remark that the idea that three distances suffice to determine at most two possible positions of the fourth atom, which is the basis on which our algorithm rests, was also very recently used in Wu et al. (in press) in the framework of the geometric build-up algorithm (Dong and Wu, 2002).