Formally, the MDGP can be defined as the problem of finding Cartesian coordinates *x*_{1}, …, *x*_{n}∈ of the atoms of a molecule such that for all (*i*, *j*)∈*S*,

where *S* is the set of pairs of atoms (*i*, *j*) whose Euclidean distances *d*_{ij} are known. If all distances are given, the problem can be solved in linear time (Dong and Wu, 2002). If there is an order on the atoms such that the given distances form cliques on each set of five contiguous atoms, the problem is polynomially solvable (Eren et al., 2004). In general, however, the problem is NP-hard (Saxe, 1979).

Note that, as stated above, the MDGP bears no connection whatsoever with molecules. In fact, the MDGP appears in such diverse application fields as 3D graph drawing (Cruz and Twarog, 1996) and network design (Eren et al., 2004). Our assumption that all distances between atoms separated by one, two, and three covalent bonds are known can be expressed as an additional condition on the set *S* of distances, namely that *S* can be partitioned into two disjoint sets *E*, *F* of distances where

and

We also assume that for all pairs of atoms in *F*, the distances are shorter than a given cut-off value Δ (usually this is taken to be 5 Å using, for example, NMR analysis; Creighton, 1993; Schlick, 2002), that is, *d*_{ij}Δ∀(*i*, *j*)∈*F*.

#### 2.1. Discrete formulation

Consider a molecule as being a sequence of *n* atoms with Cartesian coordinates given by and such that there is a covalent bond between every pair of atoms (*i*, *i*+1), for *i*=1, …, *n*−1. The *bond length r*_{i} is the Euclidean distance between atoms *i*−1 and *i* (i.e. *r*_{i}=*d*_{i−1, i} for all *i*=2, …, *n*). The *bond angle θ*_{i}∈[0, *π*] is the angle between the segments joining atoms *i*−2, *i*−1 and *i*−1, *i* (for all *i*=3, …, *n*). The *torsion angle ω*_{i}∈[0, 2*π*] is the angle between the normals through the planes defined by the atoms *i*−3, *i*−2, *i*−1 and *i*−2, *i*−1, *i* (for all *i*=4, …, *n*) (see Fig. 1).

In most molecular conformation calculations, all covalent bond lengths and bond angles are assumed to be known *a priori* (Phillips et al., 1996). Thus, the first three atoms in the sequence can be fixed and the fourth atom is determined by the torsion angle *ω*_{4}. The fifth atom can be determined by the torsion angles *ω*_{4} and *ω*_{5}, and so on. So, given all bond lengths *r*_{2}, *r*_{3}, …, *r*_{n}, bond angles *θ*_{3}, *θ*_{4}, …, *θ*_{n}, and torsion angles *ω*_{4}, *ω*_{5}, …, *ω*_{n} of a molecule with *n* atoms, the Cartesian coordinates *x*_{i}=(*x*_{i1}, *x*_{i2}, *x*_{i3}) for each atom *i* in the molecule can be obtained using the following formulae (Phillips et al., 1996):

- (2)

where

and

- (3)

for *i*=4, …, *n*. We call *B*_{i} the *torsion matrices* and denote by the *cumulative torsion matrices*. For every four consecutive atoms *x*_{i}, *x*_{i+1}, *x*_{i+2}, *x*_{i+3} we can express the cosine of the torsion angle *ω*_{i+3} in terms of the distances *r*_{i+1}, *d*_{i+1,i+3}, *d*_{i,i+3} and the bond angle *θ*_{i+2}, *θ*_{i+3} by using the cosine law for torsion angles (Pogorelov, 1987, p. 278), as follows:

- (4)

Hence, if we know all the bond lengths (*r*_{i}), bond angles (*θ*_{i}), and distances between atoms separated by three covalent bonds (*d*_{i, i+3}), we can calculate the cosine of the torsion angles defined by the atoms *i*, *i*+1, *i*+2, *i*+3 for *i*=1, …, *n*−3. We note in passing that in order for (4) to hold, we obviously need the denominator to be nonzero.

Using the bond lengths *r*_{2}, *r*_{3} and the bond angle *θ*_{3}, we can determine the torsion matrices *B*_{2} and *B*_{3} and obtain

fixing the first three atoms of the molecule. Because we also know the distance *d*_{14}, by (4) we can obtain the value cos*ω*_{4}. Thus, the sine of the torsion angle *ω*_{4} can have only two possible values: . Consequently, we obtain only two possible positions for the fourth atom:

along with the respective torsion matrices such that

where *C*_{3} is a cumulative torsion matrix. This dichotomy, shown pictorially in Fig. 2, is the basic reason why this problem can be formulated combinatorially.

For the fifth atom, we will obtain four possible positions, one for each combination of and . By an easy induction argument, we can see that for the *i*th atom we obtain 2^{i−3} possible positions. Hence, for a molecule shaped as a sequence (a linear chain) of *n* atoms, we get 2^{n−3} possible sequences of torsion angles *ω*_{4}, *ω*_{5}, … ,*ω*_{n}, each defining a different tridimensional structure. Using the matrices *B*_{i} defined above, we can convert a sequence of torsion angles into Cartesian coordinates . Thus, this problem has a finite search space. To test a candidate solution we simply use the function *f* defined in (1); the candidate solution (*x*_{1}, …, *x*_{n}) will be a valid solution if and only if *f*(*x*_{1}, … ,*x*_{n})=0.

The discussion above can be summarized in the following theorem.

**Theorem 2.1.***Consider a sequence M of n atoms such that:*

*Then there is a finite number of distinct immersions p:M**such that:*

- (a)
*p(1)=(0, 0, 0), p(2)*_{1}*=0, p(2)*_{2}*=0, p(3)*_{1}*=0 (where p(i)*_{k}*is the kth coordinate of p(i) for k**3, i**n);* - (b)
*for all atoms i, j with known atomic distance d*_{ij}*we have*:

We remark that the idea that three distances suffice to determine at most two possible positions of the fourth atom, which is the basis on which our algorithm rests, was also very recently used in Wu et al. (in press) in the framework of the geometric build-up algorithm (Dong and Wu, 2002).