### Abstract

- Top of page
- Abstract
- Introduction
- Mathematical Model
- Function and Gradient Evaluation
- Optimization and Parallelization
- PACKMOL Usage
- Examples
- Conclusions
- Acknowledgements
- References

Adequate initial configurations for molecular dynamics simulations consist of arrangements of molecules distributed in space in such a way to approximately represent the system's overall structure. In order that the simulations are not disrupted by large van der Waals repulsive interactions, atoms from different molecules must keep safe pairwise distances. Obtaining such a molecular arrangement can be considered a packing problem: Each type molecule must satisfy spatial constraints related to the geometry of the system, and the distance between atoms of different molecules must be greater than some specified tolerance. We have developed a code able to pack millions of atoms, grouped in arbitrarily complex molecules, inside a variety of three-dimensional regions. The regions may be intersections of spheres, ellipses, cylinders, planes, or boxes. The user must provide only the structure of one molecule of each type and the geometrical constraints that each type of molecule must satisfy. Building complex mixtures, interfaces, solvating biomolecules in water, other solvents, or mixtures of solvents, is straightforward. In addition, different atoms belonging to the same molecule may also be restricted to different spatial regions, in such a way that more ordered molecular arrangements can be built, as micelles, lipid double-layers, etc. The packing time for state-of-the-art molecular dynamics systems varies from a few seconds to a few minutes in a personal computer. The input files are simple and currently compatible with PDB, Tinker, Molden, or Moldy coordinate files. The package is distributed as free software and can be downloaded from http://www.ime.unicamp.br/∼martinez/packmol/. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009

### Introduction

- Top of page
- Abstract
- Introduction
- Mathematical Model
- Function and Gradient Evaluation
- Optimization and Parallelization
- PACKMOL Usage
- Examples
- Conclusions
- Acknowledgements
- References

The first step in a molecular dynamics simulation consists of obtaining initial coordinates for all the atoms of the system. For example, to run a simple simulation consisting of 300 water molecules with experimental density, we need the positions of the 300 molecules inside an adequately sized box. Furthermore, as molecular dynamics force-fields contain repulsive terms that increase abruptly for short atom-to-atom distances, the distances between atoms from different molecules must be large enough so that repulsive potentials do not disrupt the simulations. Frequently, the instability and nondifferentiability of the potential energy resulting from overlapping atoms is hard to overcome.1

For a simple system such as a water box, we can obtain an adequate configuration simply by ordering the molecules in a regular lattice. However, for slightly more complex systems such as a solvated peptide, regular configurations would almost certainly contain overlapping atoms. Many times, this inconvenience is overcome in the following way: a box of water molecules regularly distributed is constructed (or a previously equilibrated solvent box is used, when available). Then, the “big” molecule is added to the system and the solvent molecules containing overlapping atoms are removed. Finally, the energy of the system is minimized.

However, when the complexity of the system increases, the work for building a starting configuration may be very tedious. For example, if the protein is added to a box containing water, ions and urea (a common denaturant), and the overlapping solvent molecules are removed, the charge of the system and molar fraction of urea must be further adjusted. Building ordered molecular systems, such as micelles, double layers, or interfaces, require lots of trials, manipulation of files, small ad hoc codes, etc., and, so, the very first steps of the simulation turn out to be quite cumbersome. For this reason, ready-to-use configurations and specific programs have been developed for the construction of some commonly studied systems, particularly membranes.2, 3 As the variety of interesting systems being studied increases, more general approaches will be of great utility.

Recently, we proposed that the initial configuration problem can be treated as a packing problem.1 The molecules are packed within spatial regions with the desired characteristics, in such a way that atoms from different molecules keep a safe pairwise distance. Small systems composed by interfaces, mixtures of various components and solvated proteins were successfully built and used in actual molecular dynamics simulations.4–6 In that work, we have shown that random sets of appropriately packed molecules, with no intermolecular clashes, can be rapidly equilibrated to the thermodynamic energy using standard MD integration algorithms and energy minimization,1 thus validating the approach.

In this article, we show how this idea gives rise to an efficient code for public use. The construction of large systems with increasing structural complexity is now envisaged. With this in mind, we define a general set of spatial constraints, so that molecules can be packed in complex regions formed by the intersections of planes, spheres, ellipsoids, cylinders, and boxes. We have also implemented tools that allow the user to allocate different parts of the molecules to different regions, in such a way that ordered structures, such as micelles, may be built. To achieve the needs of the increasingly large systems currently being simulated, we have modified the function and gradient evaluation methods in order that the practical algorithm scales linearly with the number of atoms of the system, and we have developed a parallel version. As its predecessor, the computational software is called PACKMOL, and is free software. Successive versions of PACKMOL are employed by many members of the Molecular Dynamics community since 2003. Briefly, the present article reports the improvements of PACKMOL since its introduction in 2003 as a proof-of-concept package.

This article is organized as follows. The section below presents the mathematical model. “Function and gradient evaluation” section deals with the efficient evaluation of the objective function and its derivatives. The optimization method used to solve the packing problem and the parallelization of the objective function and its derivatives are described in “Optimization and parallelization” section. In “PACKMOL Usage” section we describe the usage of the software and we present some examples. Conclusions and perspectives are stated in “Examples” section.

### Mathematical Model

- Top of page
- Abstract
- Introduction
- Mathematical Model
- Function and Gradient Evaluation
- Optimization and Parallelization
- PACKMOL Usage
- Examples
- Conclusions
- Acknowledgements
- References

Let us call *n* mol the total number of molecules that we want to place in a region of the three-dimensional space. For each *i* = 1, …, *n* mol, let *n* atom (*i*) be the number of atoms of the *i*-th molecule. Molecules can be grouped in different types (water, protein, urea, and so on) but this classification is irrelevant for the model description. Each molecule is represented by the cartesian coordinates of its atoms. The point whose coordinates are the arithmetic averages of the coordinates of the atoms will be called barycenter. To facilitate visualization, assume that the origin is the barycenter of all the molecules. For all *i* = 1, …, *n* mol, *j* = 1, …, *n* atom (*i*), let

be the coordinates of the *j*-th atom of the *i*-th molecule.

Our objective is to find angles θ^{i} and displacements *c*^{i}, *i* = 1, …, *n* mol, in such a way that, for all *i*, = 1, …, *n* mol, *j* = 1, …, *n* atom (*i*), the point whose coordinates are (*p*, *p*, *p*) satisfy the constraints imposed to the atom *j* of the molecule *i*. In addition, we wish that for all *i* ≠ *i*′, *j* = 1, …, *n* atom (*i*), *j*′ = 1, …, *n* atom (*i*′),

- (3)

where *d*_{tol} > 0 is a user-specified tolerance. The symbol ‖·| stands for the usual Euclidian distance. In other words, the rotated and displaced molecules must remain in the desired region and the distance between any pair of atoms of different molecules must not be less than *d*_{tol}.

A large variety of positioning constraints may be required individually to the atoms. Let *r*^{ij} be the number of constraints that apply to the *j*-th atom of the *i*-th molecule. In practice, the constraints may be applied to a subset of atoms of all the molecules of the same type, or to all the atoms of a molecule, or to any desired subset of atoms or molecules, but, again, this is irrelevant for the model description.

These constraints can be represented as

- (4)

Examples of the positioning constraints that may be required for the atoms will be shown in “PACKMOL Usage” section.

In our approach, the constraints are incorporated into the objective function. This means that, for each atom, the objective function contains a part associated with the distance to other atoms and a part associated with the fulfillment of the geometrical constraints that are imposed to it. Given the position of the atom in space, the quantity *g* (*p*^{ij}) is positive if the constraint is not satisfied and negative otherwise. This justifies the addition of the terms of the form [max{0,*g* (*p*^{ij})}]^{2} to the objective function.

The objectives (3-4) lead us to define the following merit function *f*:

- (5)

where *c* = (*c*^{1},…,*c*^{n mol}) ∈ *R*^{3 × n mol} and θ = (θ^{1},…,θ^{n mol}) ∈ *R*^{3 × n mol}. [Remember the dependence of *p*^{ij} on the variables (*c*^{i}, θ^{i}) and the constants *a*^{ij} given by (1-2).] Note that *f* (*c*,θ) is non-negative for all angles and displacements. Moreover, *f* vanishes if, and only if, the objectives (3-4) are fulfilled. This means that, if we find displacements and angles where *f* = 0, the atoms of the resulting molecules fit the desired region and are sufficiently separated. This leads us to define the following unconstrained minimization problem:

- (6)

The number of variables is 6 × *n* mol (three angles and a displacement in the three-dimensional space per molecule). The analytical expression of the derivatives of *f* is cumbersome but straightforward.

The definition (5) possesses two nice features. On one hand, the functional value vanishes at the solutions of the packing problem, where no overlaps are present and all constraints are satisfied. Therefore, the global minimizers of (5) are recognized trivially. On the other hand, the function *f* is continuous and first-order differentiable. These are important advantages over the explicit minimization of the potential energy of the system. In fact, most force-fields of molecular mechanics contain non-differentiable terms (both the Van-der-Waals and Electrostatics at *d* = 0), which do not represent problems for a stable simulation, but introduce instabilities that hamper the process of obtaining a suitable initial configuration by direct energy minimization. For example, a system may have an overall acceptable energy, but a single unsatisfactory interatomic distance can lead to simulation instabilities. On the other hand, by minimizing the packing function and recognizing a global solution with *f* = 0 one is sure that there are no clashes, and energy minimization and/or simulations may run smoothly (even if the total energy of the system is not satisfactory and needs to be equilibrated to the thermodynamic energy at the desired temperature).1

The smoothness of the objective function facilitates the minimization procedure to the point that global solutions are obtained frequently. At the same time, the objective function can be evaluated using very fast procedures due to its local character (there are no long-range interactions), as will be described below.

### Function and Gradient Evaluation

- Top of page
- Abstract
- Introduction
- Mathematical Model
- Function and Gradient Evaluation
- Optimization and Parallelization
- PACKMOL Usage
- Examples
- Conclusions
- Acknowledgements
- References

The expression (5) has two terms: the first term reflects the minimum-distance requirement between atoms of different molecules; and the second one corresponds to the fulfillment of constraints. The second term can be computed in linear time (with respect to the number of atoms), whereas the first one involves all the atom-to-atom distances. Therefore, in principle, the computational cost of the first term calculation could increase as the square of the number of atoms, being impractical for large problems.

We implemented the fast evaluation of the first term using a Linked Cell technique.7 This technique is related to well-known multipole ideas8 but it is even simpler because our approach does not involve long-range interactions. Here, the system is partitioned into small boxes (bins) of side *d*_{tol}. Each atom is assigned to each bin using simple arithmetic operations. Then, as the objective function vanishes for distances greater than *d*_{tol}, for each atom only the distances to atoms belonging to the same bin or to adjacent bins is necessary. If the atoms are reasonably well distributed in space, the time required to evaluate these distances only depends on the number of atoms per bin, scaling linearly with the total number of atoms.7 Since *d*_{tol} is of the order of a few (usually 2.0) angstroms, we observe that only about 10 distances must be computed for each atom, providing the algorithm with great efficiency.

A good external bounding box not exceeding the real size of the system is necessary for an efficient partition of the space into bins. The number of bins must be small in order to avoid looping over empty boxes. We note that a “bounding box” in which all molecules are included cannot be deduced from the desired constraints. For example, for building spherical micelles, one usually imposes that the polar head of the lipids must be outside a sphere of a given radius, whereas the tail must be inside some other sphere. The presence of the outer sphere constraint makes the definition of an external bounding box quite cumbersome in general, and the structure of the molecules that are going to be packed should be taken into account.

To define a suitably practical bounding box, we decided to solve first an auxiliary packing problem, as follows. The position of the molecules are randomly generated within a very large region and the problem of packing all the molecules into the regions defined by the geometrical constraints ignoring the distances between their atoms is solved. From the solution of this easy problem, we obtain maximum and minimum coordinates for each atom of the system, and these coordinates define the external bounding box. As the initial positions of the molecules are generated in a very large box, the solution will contain the molecules generally close to the boundaries of the constraints, and a good estimate of the total size of the system is obtained.

This procedure is also useful in a different sense: Sometimes the user imposes a spatial constraint for some of the molecules of the system which cannot be satisfied (the molecule does not fit into the region). If the algorithm fails to put the molecules into the desired regions defined by the constraints, it is almost certain that the constraints are not well defined, and the desired packing will not be successful. Therefore, by solving this initial problem the software recognizes inconsistencies in the input information and stops with an appropriate error message.

### Examples

- Top of page
- Abstract
- Introduction
- Mathematical Model
- Function and Gradient Evaluation
- Optimization and Parallelization
- PACKMOL Usage
- Examples
- Conclusions
- Acknowledgements
- References

Figure 2 presents some examples of systems built with PACKMOL, that illustrate some of the capabilities of the package. The corresponding input files can be obtained in the PACKMOL site or upon request to authors.

Table 1 summarizes the components of the systems and the computational time required to solve them. All these examples were run in a Sony Vaio VGN-NR11Z/S laptop with Intel Core 2 Duo T7250 processor and 2 Gb of RAM, running Ubuntu 8.04 Linux. The computational times correspond to the serial version of the package compiled with the GNU Fortran Compiler (g77) version 3.4.6 with the “-ffast-math -O3” flags.

Table 1. Properties of the Examples of Figure 2. *N*_{atoms} is the Number of Atoms of the System, *N*_{var} is the Number of Variables of the Optimization Problem, and *t* is the Running TimeSystem | Composition | *N*_{atoms} | *N*_{var} | *t* |
---|

(a) | 400 urea molecules | 6200 | 8400 | 5 s |

| 1000 water molecules | | | |

(b) | 45 water molecules | 1125 | 1170 | 16 s |

| 150 CCl_{4} molecules | | | |

| One carbon nanotube | | | |

(c) | 1019 water molecules | 4087 | 7308 | 3 s |

| 199 CCl_{4} molecules | | | |

| One T3 (thyroid hormone) | | | |

(d) | 16,500 water molecules | 53,796 | 99,300 | 37 s |

| 20 Cl^{−} ions | | | |

| 30 Na^{+} ions | | | |

| Thyroid hormone receptor LBD | | | |

(e) | 216,000 water molecules | 1,339,200 | 1,814,400 | 37 min |

| 86,400 urea molecules | | | |

| Memory requirement: ∼400 Mb | | | |

To illustrate a complete PACKMOL input file, the one corresponding to the example (b) is given in Figure 3.

Lines 1–3 define general aspects of the system: the type of structure files provided will be pdb, the minimum distance between atoms of different molecules at the solution will be 2.0 Å and the output file will be called solvtube.pdb. The carbon nanotube will be placed according to its coordinates in the structure file provided (no rotations, no translations). Then, 45 water molecules will be put inside the nanotube, by restricting them to be inside a cylinder that starts at the minimum coordinates of the tube (0. 0. −11.7), is oriented in the z-axis (0. 0. 1.), has a radius of 4.0 Å and a length of 23.4 Å. The carbon tetrachloride molecules, then, are put outside this same cylinder, and inside a box with minimum coordinates (−10. −10. −11.7) and maximum coordinates (10. 10. 11.7), therefore surrounding the whole system.

### Conclusions

- Top of page
- Abstract
- Introduction
- Mathematical Model
- Function and Gradient Evaluation
- Optimization and Parallelization
- PACKMOL Usage
- Examples
- Conclusions
- Acknowledgements
- References

We have developed a software for building initial configurations for molecular dynamics simulations based on the concept of packing optimization. The software is called PACKMOL and allows the user to define the molecular system to be simulated by packing different types of molecules into regions defined by geometric constraints. Function and gradient evaluations are optimized to the point that millions of atoms can be packed in reasonable time, and initial configurations for state-of-the-art simulations can be built in few minutes or seconds. The user must provide only the structures of one molecule of each type and the geometrical constraints that must be fulfilled. It is possible to build mixtures of several components, solvate proteins, and build ordered arrangements as double layers or micelles. The package is currently compatible with Tinker, Moldy, Molden's XYZ and PDB file formats. It is free software and is available online at http://www.ime.unicamp.br/∼martinez/packmol. PACKMOL has already been used for building initial configurations for different applications in different research groups, such as protein solvation with different solvents,6, 15 multiple component mixtures or uncommon liquids,16, 17 ionic liquids,18 polymer solutions,19 interfaces,4, 5 and others.20–27

Further improvements of the package may include macro-input keywords to build the most common systems and more complex unit cells, the parallelization of tasks other than function and gradient evaluation, the improvement of global convergence heuristics, and the development of a graphical user interface, or its incorporation as a plugin into some graphical molecular viewer.