The recent years have witnessed a growing number of studies focused on increasing the range of space and time scales attainable in computer simulations. One of the most crucial points in pursing this goal is that the acceleration would not be provided only by improving the computer architecture, but rather by utilizing smart computational schemes to bridge the gap of space and time scales. More importantly, this target is meant to be achieved without losing accuracy in describing (at least part of) the physicochemical properties of the systems under study.
In this perspective, we discuss a technique that allows to parametrize coarse-grained force fields, using as reference some properties of the system simulated at a finer grain of description. Initially known as force matching (FM) method, slightly different flavors of its applications have been recently called potential matching, combined function derivative approximation, adaptive FM, matching algorithm, force balance, or multiscale coarse graining. In all the above listed approaches, the sought parametrization is obtained through a minimization procedure of an objective function, that, in the seminal paper of Ercolessi and Adams on FM, was a sum of least-square residuals of atomic forces (vide infra).
In principle, FM could be applied to a hierarchy of scales starting from high level ab initio molecular dynamics (MD), passing through atomistic force field (polarizable if needed), to higher levels of coarse graining. Potentially, if the computational and methodological issues still open will be solved, given its intrinsic degree of automatization, FM would represent an efficient and systematic approach to tackle the space and time scales problem. Current research is focused on solving these weaknesses and on improving the performances of the method. Once this goal will be achieved, FM would become a strong candidate to face the study of a wide range of systems, in particular in the realms of material science and biophysics. As far as multiscale methods are concerned, FM is direct competitor of other methods based on the distribution matching approaches such as the iterative inverse Boltzmann or the inverse Monte Carlo, where the distribution functions of a fine-grained simulation are fitted to the coarse-grained force field.
Since the first study, dating back to 20 years ago, FM has grown to include molecular torques,[4, 5] stress tensors,[10, 11] and total energy[6, 11, 12]; for more complex systems, as proteins, it has been coupled to charge fitting methods based on the electrostatic potential to improve its efficiency. The implementation of FM with well-established methods to compute the value of some parameters, such as charges, polarizabilities and, in some cases, dispersion terms, might reduce the risk of cancellation errors, hence improving the robustness of the procedure. It is also important to mention that free implementations of FM are available, either as standalone packages or as part of MD codes.
We hereby discuss merits and shortcomings of FM, by identifying three main routes along which the method would become a sound, automated, and possibly reliable multiscale approach. More in detail, in the following sections, we are going to discuss about the minimization procedure, the empirical potential, and the reference data.
Let us define the potential energy surface (PES) of the reference system as a many body interaction potential , which depends on the coordinates of N particles. Since the calculation of this potential is time demanding, the aim of the FM procedure is to search, in the parameter space of a simpler empirical potential , the parameterization that best resembles . For the moment, we do not make any assumption on the functional form of , besides the fact that it is a real, continuous, and derivable function. The best guess would be given by a set of parameters that cancels the difference between the reference and the fitted PES. Even if, due to the coarse-graining procedure itself, it is impossible to get such an equivalence, the aim of the FM approach is to minimize the difference between and , and thus to get the best possible representation of the PES in the coarse-grained description. We notice that spanning the PES directly is not efficient, as for any configuration of N particles we only gain information on one point of the multidimensional surface. Conversely, if we consider the forces, each configuration will contribute with 3N data points to be fitted. In addition, molecular torques, dipoles, and the stress tensor are other properties related to the underlying PES that provide a huge amount of data points for the fitting procedure.
In general, the fitting procedure is based on the minimization of an objective function related to the square deviation of a given property A:
where Nconf is the number of configurations and LL and HL stand, respectively, for low-level and high-level of description. For example, HL could be any density functional theory (DFT) MD simulation, and LL a classical atomistic force field. and are, respectively, the weight of the configuration and that of the fitted property for the particular configuration. In the microcanonical ensemble, is unit, while in the canonical ensemble, it is given by the Boltzmann factor , Ei being the configurational energy of the configuration. For the statistical weights of the property , we notice that different choices lead to different set of parameters for . Even if in literature this fact is often overlooked, in our past contributions, we have shown that the final result is deeply influenced by the choice of the weights, and we have been struggling for few years to find the best way of choosing them.[5, 16]
Here, we present a toy model that clearly conveys the importance of sampling the reference properties with a proper weight. Consider a slightly anharmonic potential well as illustrated in Figure 1. The deviation from harmonicity is noticeable only for huge displacements, which have a low probability, as shown from the Boltzmann distribution (black dashed line). Therefore, in a MD simulation in the canonical ensemble, only few data points would give information on the region of the potential where the anharmonicity gains importance (see difference from the pure harmonic potential depicted with the blue dotted line), while most of the points would be sampled close to the equilibrium distance, where the harmonic behavior prevails. If we consider an equal weight for all configurations w = 1, most of the information in the objective function would be related to the harmonic part of the potential and the minimization procedure would lead to a potential with a high uncertainty in the anharmonic region. Conversely, one could prevent this shortcoming by choosing properly the weights, for example taking w proportional to the square of the forces. Contrary to the previous case, the minimization would be biased toward the anharmonic region, and the error would be made in the fitting of the harmonic part. In our opinion, the most natural choice, is to weight the contributions of the property A as the inverse of its distribution p(A). For example, if a force of given magnitude appears n times more often than another, then it should weight 1/n; in this way, both forces are accounted equally, and the square differences appearing in the objective function have the same importance. Using such criterium, we rule out any bias toward one region of the underlying potential, thus getting a homogeneous uncertainty in the fitted potential. In the case of the anharmonic potential, the contributions far from the equilibrium will count as the sum of the contribution close to it, and a better fit of the underlying PES is obtained. We are currently working on this aspect to provide the minimization procedure with a better approach. The main advantage is that, within the FM approach, p(A) could be easily computed from the reference trajectory, and that no further calculations are required.
For the moment, we have said nothing about the functional form of the fitted potential. It is clear that, by construction, it cannot be as refined as the reference potential. Nonetheless, it is highly important that its main physical properties are preserved. Consider, for example, the application of FM to parameterize a classical force field for water, based on reference ab initio simulations. Most force fields for water are usually given by a sum of two-body terms as a Lennard–Jones potential plus Coulomb interactions. This form is widespread in molecular simulations because it is easy to handle from the computational point of view, and because it provides a clear understanding of the parameters in terms of their physical meaning. Nonetheless, as it has been shown recently by Wang and coworkers [17, 18] and by our group, this approximation might be flawed in that important pieces of the interaction potential are missing. In fact, the short range behavior of the dispersion term is not well-described by a simple Lennard–Jones potential, and other terms are needed. The missing parts to correct for this shortcoming were introduced ad hoc in the functional form by us and by Wang and coworkers. Both corrections showed to be successful, and Wang's potential is indeed capable of reproducing the thermodynamic anomalies of water at low temperature.
We notice, although, that for more complex systems, changing “manually” the functional form of the reference potential would be more complicated and time consuming. Voth and coworkers make use of cubic splines to fit the reference forces.[21, 22] Although, using such approach, the fitted potential does not contain any intelligible information on the physical interaction at the atomistic level. Given the high computational capabilities of modern architectures, we hereby suggest a step forward in the method development. Instead of restricting ourselves to a search in the parameter space, the space of functional forms for the empirical potential could be also spanned to minimize the objective function. This would be a smart way of fitting the ruggedness of the reference PES, without introducing any bias related to the selection of a predetermined functional form. Of course, the search should be restricted to a subset of functions that are easy to handle in molecular simulations. This restriction would be guided by mathematical requirements (such as derivability, domain, etc.), by algorithmic limitations (as e.g., for long-ranged interactions), and by the computing capability. It has been shown that this approach works fine for the development of an empirical valence bond potential, through the use of genetic algorithms. Extending the search, space as we propose would allow the use of FM to parameterize reactive force fields, or to describe systems that are not in the electronic ground state.
The problem of the quality of reference data is probably the less discussed in most of contributions to FM. On the one hand, it is believed that the coarse-graining procedure would anyways introduce errors in the description of the PES; on the other hand, it is considered that, since we are still at the dawn of the method development, it is more important to get the method working, than to calibrate accurately the quality of the reference data. This problem becomes particularly evident when the high level of description is given by ab initio (usually DFT) calculations. In fact, for these methods, it is not easy to gauge the error, and usually only short simulations could be performed. Thus, besides the quality of the underlying PES, the sampling uncertainty is also important. For heterogeneous systems, which are the most appealing for multiscale simulations, it may happen that the sampling is not ergodic enough to provide a reliable representation of the PES. Think, for example, of a solvated ion: presently, the most sophisticated simulations allow to propagate the dynamics for about 50 ps. It is known that for many alkali ions, this timescale corresponds to the characteristic time for water exchange.
Wang and coworkers, proposed an adaptive QM/MM scheme to tackle the above problems. In their approach, the low-level force field is iteratively improved by applying the FM to a subregion of the whole system. As a first step, a classical MD of water with an initial guessed parametrization is performed; once the system is well-equilibrated, a high-level QM/MM calculation is made on part of the system; then, the new guess for the force field (made with FM) is used to propagate the classical system again, and the procedure is iterated until convergence of the parameters. Using such implementation, there is space to improve the quality of the QM description using post Hartree–Fock approaches. Moreover, the sampling is ergodic, as the classical system is allowed to propagate for times longer than its characteristic time scales.
It has not been shown, although, that this approach is not biased toward sampling a given region of the phase space. In fact, as the configurations are always sampled with the low-level method, and the high-level calculations are used only to evaluate the forces, it is not guaranteed that all regions of the high-level PES are sampled correctly. Recently, one of us has proposed a QM/MM technique that allows to sample ergodically the QM PES, where the level of accuracy of the QM description is as high as the one attained with the method developed by Wang and coworkers. With this method, called BEST after “Boundary Exchange Symmetry Theory,” by adding a bias potential, the exchanges between the QM and the MM regions are not allowed. BEST has been shown to efficiently preserve the statistical distribution and to be highly stable as for what concerns energy conservation, while being computationally less expensive than most methods based on adaptive QM/MM partitioning. Moreover, the sampling bottleneck is easily overcome by implementing the method with multiple time step algorithms. With such implementation, even path integral QM/MM simulations are accessible, making it a good candidate to produce high-quality reference data accounting for nuclear quantum effects.
In this perspective, we have briefly presented the main aspects of the FM method. In our view, the method is potentially the best candidate for automated multiscale simulations, provided it is further developed along the lines we sketched in our discussion. We have shown how the minimization procedure could be improved by taking into account the correct weights for each configuration and for each property in the objective function. The configuration weights could be calculated in any statistical mechanic ensemble from the associated probability function. The weights of each property A, to ensure a good representation of the entire PES, should be inversely proportional to its distribution p(A). We have also discussed the perspective of extending the search to the functional space. In this way, more flexible force fields would allow for a better description of the underlying physics, as we have demonstrated for the simple case of water. This issue becomes even more compelling when complex, heterogeneous systems are studied. Finally, we have presented two methods to overcome the sampling problem of ab initio simulations, together with the issue of their quality.
Ideally, once a fully reliable implementation of the above three points will be achieved, it will be possible to perform automated and reliable multiscale simulations of many different systems, and to include interactions beyond nuclear and electronic ground states, even at the coarse-grained level.
The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the Barcelona Supercomputing Center—Centro Nacional de Supercomputación.
Marco Masia obtained his M.Sc. in Chemistry in 2000 and his Ph.D. in Applied Physics in 2005. Since 2006, he has been Assistant Professor at the University of Sassari (Italy). From 2012 to 2015, he is Marie Curie Fellow at the Goethe University Frankfurt (Germany). His research interests include Force Matching and force field development as well as the study of chemical reactivity at the liquid phase. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
Elvira Guàrdia was born in Bellcaire d'Urgell, Catalonia (Spain) in 1960. She received the M.Sc. (1983) and the Ph.D. (1986) in Physics from the University of Barcelona. She is currently Professor of Physics at the Technical University of Catalonia (UPC). Since 1998, she has been the coordinator of the Computer Simulation in Condensed Matter Research Group of the UPC. Her present research interests include aqueous systems at liquid conditions and solid–liquid interfaces. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
Paolo Nicolini obtained his Ph.D. in Chemistry (2012) working on development and application of nonequilibrium simulation methods for free energy calculations. He worked then as postdoctoral researcher at the Technical University of Catalonia developing new classical force fields for water using the force matching approach. He joined the Advanced Materials group at the Czech Technical University in Prague in June 2013. His work is focused on the simulation at atomistic level of systems manifesting lubricating properties. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]