The GromPy python interface
The GROMACS package is written in the C programming language. We base our development tree on GROMACS version 4.0.5 that will be ported to the latest development branch in the near future.
To implement the interface, we choose the python programming language. Python is a high-level, interpreted, object-oriented, and multiplatform programming language. It provides a large standard library and is easy to code. We use the free and open source CPython implementation of python. Apart from the standard library, python has excellent extensions for numerical data analysis and data display.[11–16] CPython is written in C and compiles python programs into intermediate code that can be executed by a virtual machine. The CPython implementation also allows the implementation of modules in C and the interfacing of (precompiled) libraries.
In our setup, we use the ctypes module as interface between python and the GROMACS C-library. The ctypes module contains python equivalents for all basic C data types and allows the mapping of compound structures in C to python classes. As soon as the GROMACS data structures are accessible via ctypes, we can pass them to external GROMACS functions and access the result from the python interpreter during the execution with the GromPy module.
The initial GromPy implementation can be used for the analysis of trajectories, for example, using GROMACS' periodic boundary condition removal and structure fitting routines. GromPy can also read in index groups and topologies and was applied in the prototyping of GROMACS tools, which were later implemented in C. Recently, GromPy was applied to design a combined MD/MC approach to simulate FRET experiments and aid in the distance reconstruction. This work involves extending GromPy by a GCMC simulation mode. The GromPy source code is publicly available at https://github.com/GromPy.
Hybrid GCMC/MD Simulations
In GCMC, the simulation box is in chemical equilibrium with an external bath. Hence, the chemical potential μ of both systems is equal. One therefore imposes the chemical potential of a particular molecular species upon which molecules are exchanged between the external reservoir and the simulation box. In practice, this means that molecules are inserted into or removed from the simulation box during simulation. The MC acceptance rule for insertion of a molecule reads
where N is the number of molecules, V is the box volume, is the thermal De Broglie wavelength (h denotes Planck's constant, m is the molecular mass, kB Boltzmann's constant, and T is the temperature), β = 1/(kBT) is the inverse temperature, and ΔU = U(N + 1) − U(N) is the energy difference of adding one molecule at a random position in the simulation box. For removal of a molecule, we use the following acceptance rule
where ΔU = U(N − 1) − U(N) is the potential energy difference associated with the removal of a randomly selected molecule.
To simulate thermal motion, we apply several MD steps at constant NVT using the velocity rescale thermostat, which generates a canonical ensemble, in between the GCMC moves. The nature of the MC move (i.e., a trial insertion/removal or an MD move) during a MC cycle is chosen at random based on a user-defined list of probabilities for each type of MC move.
Extending GromPy and modifying the GROMACS source code
This work involves an extension of GromPy, enabling GCMC using the GROMACS C-library. The general setup is shown in Figure 1. When used in GCMC mode, GromPy needs a starting configuration with a number of molecules Ni,start of type i in the form of a GROMACS tpr file stored on disk. Such a tpr file serves as input for a GROMACS simulation and contains all simulation parameters and a configuration of a system. The tpr file range Ni ∈ [Ni,min,Ni,max] is generated in the preprocessing stage, where Ni,min ≤ Ni,start and Ni,max ≥ Ni,start are the extrema of the Ni sampling range. By imposing a chemical potential μi of this molecule type, GromPy samples the Ni range via the hybrid GCMC/MD algorithm.
Figure 1. The GCMC simulation setup used in this work. The preprocessing stage (1) involves generating tpr files for each configuration Ni ∈[Ni,min,Ni,max] using grompp. The input (2) for GromPy comprises the molecule type i for which the chemical potential μi is imposed, a range of numbers of molecules [Ni,min,Ni,max] of type i that can be sampled, a starting configuration Ni,start, the tpr path on disk, and the output path on disk. The preprocessing step requires prior knowledge of the input parameters (a). GromPy (3) reads the input parameters (b). Molecular insertions/removals requires tpr reads from disk (c). Once read, the associated data structures are kept in memory. The necessary energy evaluations are performed by the GROMACS library (4) with which GromPy communicates (d). This shared object library is compiled (e) from a slightly modified version (5) of GROMACS 4.0.5. The generated output can be further analyzed by the native GROMACS analysis suite, GromPy, or other software.
Download figure to PowerPoint
All MC moves in our hybrid MD/GC MC module require having the current state sc:[Ni,c,r,v] and associated total potential energy Uc in memory, compare Figure 2. This state is a member of the grand-canonical ensemble and thus comprises the current number of molecules Ni,c of type i, the coordinates r, and the velocities v (we use the rN and vN short hand notation for the coordinate and velocity arrays consisting of N elements). The GCMC module uses two MC move types: one that performs several MD steps on sc to simulate thermal motion of the molecular system and one that performs a GCMC move that tries to modify sc by inserting or removing a molecule. For computational efficiency, the MD move is always accepted as the resulting configuration is already part of the correct statistical mechanical ensemble. After the MD move, we update the coordinates, velocities, and total potential energy. Inside the GCMC move, we select either the removal or the insertion of a molecule with a probability of Pinsert = Premove = . For insertion, we generate a trial state st that has Ni,t = Ni,c + 1 molecules. The first Ni,c elements of the coordinate and velocity arrays are copied from sc. The last element is filled by a random molecular position r′ inside the box and by a molecular velocity v′ chosen at random from the Maxwell–Boltzmann velocity distribution associated with the imposed temperature T, respectively. This step requires having st in memory. If this is not the case, we first read a tpr file with Ni = Ni,t from disk. A molecular removal involves generating a trial state st that has Ni,t = Ni,c − 1 molecules. We randomly select a molecule (k) from the list and copy the Ni,c elements of the coordinate and velocity arrays from sc to st, while excluding the kth element. Again, we require having st in memory and read from disk otherwise. Trial insertions or removals with associated Ut are accepted according to eq. (1) or eq. (2) (where ΔU = Ut − Uc), respectively. If accepted, we update st to sc and the associated potential energy Ut becomes Uc. Otherwise, we keep sc and Uc. After each MC move, we update the averages and increment the MC loop iterator.
Figure 2. Flowchart of GromPy in GCMC mode. Each MC move is based on the current state sc:[Ni,c,r,v] and associated total potential energy Uc, kept in memory. State sc is defined by the number of molecules Ni,c of type i, their coordinates r and velocities v. GromPy uses two MC move types: an MD move of several MD steps and a GC MC move. After the MD move, the coordinates, velocities, and total potential energy are updated. The GCMC move involves removal or insertion of a molecule selected with a probability Pinsert = Premove = . Insertion or removal requires having st in memory. If this is not the case, we first read a tpr file with Ni = Ni,t from disk. Insertions or removals are accepted with the probabilities in eq. (1) and eq. (2), respectively. If accepted, st becomes sc and Ut becomes Uc. Otherwise, we keep sc and Uc. After each MC move we update the averages and increment the MC loop iterator j.
Download figure to PowerPoint
As described above, the GCMC module uses the current and trial states (sc and st) to sample the grand-canonical ensemble. For this, energy evaluations are needed to obtain Uc and Ut that serve as input for the acceptance rules for insertion [eq. (1)] and removal [eq. (2)]. At run time, the states are stored in memory by interfacing with specific GROMACS library functions. The associated energies Uc and Ut are computed by calls to the GROMACS library. Both operations are performed using the python ctypes module. To achieve the interfacing, we modified the GROMACS 4.0.5 source code as shown in Figure 3. Although the modifications were performed for the serial implementation of GROMACS, we intend to make the modifications compatible with the parallel parts of the code. We expect that this will require relatively little effort. The GROMACS function mdrunner() loads a tpr file and can perform an MD simulation on a given system. This function is called by the GROMACS mdrun executable. As ctypes can load only shared object libraries, we compile the mdrun executable as a shared object library: libmdrun.so. During a GCMC run, we generate trial states st by copying the current state sc to st and adding a trial position (and velocity) for insertion or excluding a randomly selected molecule for removal. To achieve this flexibility, we have split up the mdrunner() function into three parts: mdr_init(cs), mdr_int(cs), and mdr_fin(cs). We added a new data structure cs for the current state that enables communication between the subfunctions. For our purposes, the most important member of cs is the state s. By subsequently calling the three separate functions (and without modifying cs in between), the behavior of the original mdrunner() function is reproduced exactly. Function mdr_init(cs) reads a tpr file from disk and stores the state s in cs. Function mdr_int(cs) performs an MD calculation of NMD steps. NMD is also a member of cs and can be set from within GromPy. For an MD move, the number of MD steps is set to NMD > 0 and for energy evaluations in a GCMC move it is set to NMD = 0 (which results in a single point energy calculation). Computational performance of the simulation is calculated by function mdr_fin(cs). The gain in total computational time is realized by keeping cs in python memory once initialized by a disk read. In this way, cs can be (re)used efficiently for MD or GCMC moves.
Figure 3. Modification of the source code of GROMACS version 4.0.5. Left: default compilation yields the mdrun executable (among others). This program calls function mdrunner() that is the calculation engine for MD simulations. Right: compilation of the mdrun executable as the shared object library libmdrun.so and splitting up function mdrunner() into an initialization stage, an integration stage, and a finalization stage. Communication between the stages is achieved through the new cs data structure. Library libmdrun.so is loaded into the GromPy module where mdr_init(cs), mdr_int(cs) and mdr_fin() are called for performing MD moves and GCMC trial moves by manipulating cs before each MC move.
Download figure to PowerPoint
Note that the Ni,start configuration should be an equilibrated one. However, this is not a precondition for all other Ni ≠ Ni,start tpr files that the user wishes to use for sampling, as this tpr file is merely used to fill the coordinate and velocity arrays in a trial move. During simulation, sc will always be part of the correct ensemble.
To summarize, once in memory cs can be manipulated for whatever intended purpose and can serve as input for mdr_int(cs). Our purpose is GCMC and we therefore need to manipulate the cs members s and NMD. Obviously, the same behavior can be achieved by executing a shell script that calls the necessary GROMACS executables, that is, grompp and mdrun. The downside of such an approach is that most of the time the GCMC shell will perform file I/O and/or system calls, mainly invoked by the necessary consecutive execution of the GROMACS grompp and mdrun applications. Having the relevant GROMACS data structures in memory, combined with the modified GROMACS source code drastically reduces the time spent on file I/O and renders GromPy an efficient GCMC application, with less than 6% of run time spent in overhead. This overhead involves logging to disk, reading of tpr files from disk, iterating over the MC loop, replacing the rN and vN arrays for trial insertions/removals, and associated evaluations of eqs. (1) and (2).
Validation of the GCMC module
We aim to validate the GROMACS-GCMC scheme by comparing equations of state (EOS) determined by GCMC and NVT MD. For this, it is necessary to simulate a single phase. We therefore choose to simulate supercritical fluids. The validation is performed for two model systems. The first system consists of single Lennard-Jones (LJ) particles of the same type. For this, we use water particles of the MARTINI coarse-grained force field that are modeled as single LJ particles. For this system type, we approximate the critical properties by Gibbs ensemble simulation results. For the second system, with polar SPC water, we also need to account for charges and insertions/removals of multi-atomic molecules, rendering it a more complicated and challenging test case. The critical properties for the SPC model are taken from the literature: Tc,SPC = 587 K and ρc,SPC = 15 mol/l.
In the LJ simulations, we use a shift potential for the nonbonded interactions with a switch radius of rs = 0.9 nm. The nonbonded interactions were truncated at rc = 1.2 nm. In the SPC simulations, all nonbonded interactions were calculated up to a cut-off distance of rc = 0.9 nm (corrections to the total energy and pressure due to truncation are taken into account) and the Coulombic interactions are calculated by the particle mesh Ewald method with a spacing of the Fourier grid of 0.12 nm.
The NVT EOS for both systems are determined at T = 773 K and T = 900 K. The simulation parameters are summarized in Table 1. For each density ρ, we perform a separate simulation of which the ranges are the x-values in Figures 4a and 4b for the LJ and SPC models, respectively. These density ranges are obtained by changing the box volume, while keeping the amount of molecules constant. A pilot experiment showed that NVT results are consistent when varying the box volume V at constant N or varying N at constant V. We average the total pressure p and hence obtain a pressure profile as a function of density ρ.
Figure 4. EOS for the LJ model (a) and the SPC water model (b) at T = 773 K (top) and T = 900 K (bottom). The data points on the p(ρ) line are determined by MD at NVT (the error bars indicate the standard deviation of the pressure fluctuations). The points on the μex(ρ) line are determined by grand-canonical MC using GromPy in GCMC mode. The standard deviations of μex and of ρ for the μVT data are shown as error bars (at 1σ) and were calculated by conventional error propagation rules. The least-squares fit of a sixth degree polynomial to these μex points was transformed into a p(ρ) curve using eq. (4). Both results are shown as dotted lines in each plot.
Download figure to PowerPoint
Table 1. MD parameters used in this work for the LJ and SPC models
|Δ t (ps)||2 × 10−2||2 × 10−3|
|tNVT (ps)||2 × 103||2 ×103|
|tMD, μ VT (ps)||2||0.2|
|tGCMC, μ VT (ps)||0||0|
The μVT EOSs at T = 773 K and T = 900 K are obtained by imposing a range of chemical potentials μ to fixed volume systems of either LJ particles or SPC water molecules. The simulation parameters of the μVT simulations can be found in Table 2. The MD parameters used in MD moves are listed in Table 1. For the density ranges studied, compare the x-values in Figures 4a and 4b for the LJ and SPC models, respectively. For this type of simulation, we obtain a density profile as a function of μ.
Table 2. GCMC parameters used in this work for the LJ and SPC models
The Gibbs–Duhem equation is used to validate the μVT results
from which the pressure profile
is derived. The excess part of the chemical potential μex is calculated as
where μid is the ideal part of the chemical potential. The pressure as a function of density p(ρ) in eq. (4) is determined from a numerical least-squares fit of a sixth degree polynomial to the μVT data of Ndat = 1000 data points.