Molecular simulation approaches to study crystal nucleation from solutions: Theoretical considerations and computational challenges

Nucleation is the initial step in the formation of crystalline materials from solutions. Various factors, such as environmental conditions, composition, and external fields, can influence its outcomes and rates. Indeed, controlling this rate‐determining step toward phase separation is critical, as it can significantly impact the resulting material's structure and properties. Atomistic simulations can be exploited to gain insight into nucleation mechanisms—an aspect difficult to ascertain in experiments—and estimate nucleation rates. However, the microscopic nature of simulations can influence the phase behavior of nucleating solutions when compared to macroscale counterparts. An additional challenge arises from the inadequate timescales accessible to standard molecular simulations to simulate nucleation directly; this is due to the inherent rareness of nucleation events, which may be apparent in silico at even high supersaturations. In recent decades, molecular simulation methods have emerged to circumvent length‐ and timescale limitations. However, it is not always clear which simulation method is most suitable to study crystal nucleation from solution. This review surveys recent advances in this field, shedding light on typical nucleation mechanisms and the appropriateness of various simulation techniques for their study. Our goal is to provide a deeper understanding of the complexities associated with modeling crystal nucleation from solution and identify areas for further research. This review targets researchers across various scientific domains, including materials science, chemistry, physics and engineering, and aims to foster collaborative efforts to develop new strategies to understand and control nucleation.

up to 10 9 atoms over simulation times ranging from 10 À9 to 10 À6 s.Nevertheless, understanding and predicting how crystals with well-defined composition, structure and properties assemble from a supersaturated solution remains a formidable task using molecular simulations.The small time scales probed by MD simulations are generally inadequate to study the complex microscopic steps involved in crystal nucleation and growth.As we highlight below, however, advanced sampling schemes and theoretical treatments are often invoked to circumvent these limitations.In doing so, MD simulations can help to determine crystal nucleation and growth rates, polymorphism, crystal habit and defect density as a function of solution composition, temperature and the presence of additive/impurities and external forces.Indeed, a wealth of information can be gained by combining complementary computational tools to probe different crystallization steps from solution, but it is not always apparent which tools are most appropriate for the specific task at hand.
In this review, we provide an extended summary of the key theoretical and methodological features associated with atomic-scale modeling and simulation of crystallization triggered by homogeneous nucleation from solution.
In particular, we focus on the key differences and complementarities between simulation methods implying that nucleation follows a classical mechanism and methods that enable us to ask questions about the mechanism itself.In this process, we contrast simulation methods based on theoretical prior knowledge of nucleation mechanisms (e.g., seeding) with rare events simulation methods based on the introduction of a bias potential (e.g., metadynamics) and methods based on sampling nucleation trajectories in path space (e.g., forward flux sampling).][30][31] Our review complements an overview of the literature provided by Agarwal et al., 32 which delves into the theoretical background of rate theories applied to nucleation problems, as well as the review by Sosso et al. 33 that provides a wider appraisal of simulation methods to investigate crystal nucleation from liquids, without a specific focus on crystal precipitation from solution.We believe solutions deserve particular attention because, even if crystallization from multicomponent liquids is extremely common, it presents peculiar challenges associated with the fact that the product phase is usually characterized by a different composition than the parent phase and that the driving force for the process depends on solute concentration.After defining the remits of applicability of atomistic simulations to nucleation problems and highlighting the theoretical basis associated with the main features of nucleation processes investigated by molecular simulations, we briefly review the literature describing the methods to model nucleation from solution with atomistic detail.Finally, we provide a critical comparison of insights obtained from works that investigate the nucleation of NaCl(s) from aqueous solutions: a problem that has been tackled using approaches covering the entire spectrum of methods reviewed in this work.

| CLASSIFYING NUCLEATION FROM SOLUTION
Before delving into an overview of simulation methods to study crystal nucleation from multicomponent liquids, it is important to discuss the additional adjectives that qualify the nucleation process in the scientific literature and their meaning.This is essential to define the applicability domain of molecular simulation methods and to formulate appropriate research questions that could emerge when different nucleation processes and mechanistic hypotheses are being investigated.Figure 1 summarizes the different classes of nucleation discussed below and provides a sequence of considerations in order to determine an appropriate classification.

| Primary versus secondary nucleation
Primary nucleation is the spontaneous formation of new crystalline particles from a metastable solution phase without any interplay with pre-existing crystalline particles.This happens due to rare fluctuations in local solute order.Primary nucleation rates are usually low and can be influenced by factors such as solute concentration and temperature.In contrast, secondary nucleation occurs when pre-existing crystals or crystal surfaces (of the same nucleating substance) promote the formation of new crystals by attracting and attaching crystal growth units.Secondary nucleation is often much faster than primary nucleation and can be influenced by external factors such as agitation and shear forces that lead to particle attrition.

| Homogeneous versus heterogeneous nucleation
Homogeneous nucleation (HON) occurs in the bulk of a supersaturated solution and can be controlled by changing the solution environment.As well as temperature and pressure control, solution additives may affect the nucleation behavior.While the rates for HON can be predicted using a suitable theory/model, the location where nucleation occurs cannot be determined a priori.HON requires specific conditions for it to occur and is much less prevalent in nature than heterogeneous nucleation (HEN).In HEN, crystal nucleation is facilitated by interfaces-usually a solid submerged in the solution phase.Surfaces can act as nucleants to direct the site-specific crystallization of particular crystal polymorphs or, more generally, enhance the rates for crystallization.The direct simulation of heterogeneous nucleation is a daunting task, as, in general, information on the local structure of the surfaces promoting nucleation is not available.Hence, while the methods discussed in this review are, in principle, applicable to the study of heterogeneous nucleation, as shown by the recent works on heterogeneous ice nucleation, [35][36][37][38][39] examples of their application to HEN from solution are currently lacking.As such, in Figure 1, we report HEN as being on the edge of the applicability domain for molecular simulations of nucleation from solution.

| Single-step versus multi-step nucleation
Single-step nucleation refers to the direct formation of crystals from their primary building blocks in solution via a single energy barrier in the free energy landscape, giving rise to an induction time associated with a single bottleneck toward crystallization.In contrast, multi-step nucleation encompasses a variety of crystallization pathways, which can involve the formation of intermediate precipitate phases, such as liquids, amorphous solids or other crystalline phases.The pathway to the most stable crystal form from solutions may involve multiple intermediates occurring in sequence reproducibly.If the relative stability of the intermediates increases with each step, this type of mechanism is consistent with the Ostwald-Lussac empirical rule of stages. 40,41I G U R E 1 To gain insight into nucleation mechanisms and evaluate rates, an appropriate simulation method must be chosen to investigate the system at hand.This flow diagram identifies where theory-based simulation methods (based on CNT and its extensions) are most suitable.In particular, we highlight that secondary nucleation typically involves interactions of crystalline particles with the fluid at a scale inaccessible to molecular simulations and, with some notable exceptions, 34 it is typically outside the scope of atomistic simulation studies.Heterogeneous nucleation is only partly included in the applicability domain as, while most methods discussed in this review can in principle be employed to study nucleation at solid/liquid (S/L) interfaces, appropriate examples exist only for crystal nucleation from the melt. 33e specify that intermediates must be demonstrably involved in the formation of crystals to classify crystallization as multi-step.The mere presence of intermediate structures before or even during crystallization does not exclude the possibility that single-step crystal nucleation occurs (e.g., by a dissolution/re-precipitation reaction).
An archetypal example of multi-step nucleation is two-step nucleation, widely described as a process where crystalline order emerges in dense liquid precursors that form during the first phase transformation in solution. 42,43Figure 2b shows how a two-step pathway deviates from a single-step (one-step) pathway on a two-dimensional reaction coordinate defining cluster size and order.It is assumed that the rate determining step along a two-step pathway is the emergence of order within dense liquid domains (see the snapshot in Figure 2b), which form readily when the supersaturation of the parent phase is sufficiently high.8][49] In other words, multi-step nucleation does not necessarily refer to a cascade of single-step nucleation events.
The process of nucleation in complex solutions may be dependent upon chemical reactions and changes in species stoichiometry occurring locally.Several studies [50][51][52] have attributed the multi-step nature of nucleation to these factors.Simulating reactive crystallization events is often beyond the scope of classical molecular simulations using semiempirical, non-reactive force fields.As such, this review focuses on simulating nucleation from building blocks which are already present in the parent solution phase.These examples, however, highlight the importance of simulation practitioner's understanding of the chemical speciation and valency of molecular and ionic species involved in crystal formation.

| Classical versus nonclassical nucleation
In Section 3.1, we briefly describe classical nucleation theory (CNT), the thermodynamic and kinetic framework for classical nucleation.For the purposes of classifying nucleation pathways, we stress that classical nucleation is a The finite size effects of closed simulations manifest in a free energy minimum for the stable thermodynamic phase resulting from nucleation.This is different to the macroscopic case due to a bounded partition function and a depletion of crystal building blocks that transfer from the parent phase to the nucleating phase.(b) Crystal nucleation pathways from solution to crystal represented on a two-dimensional reaction coordinate characterizing cluster density and order.The diagonal marks the case for the concomitant increase in solute cluster order with density, indicative of the capillary approximation adopted in CNT.Pathways that deviate from this limiting case include two-step nucleation, where, for example, crystalline order is established in the core of liquid-like precursors, as shown on the right of B. In the case of NaCl nucleation (discussed below), simulations demonstrate a transition from one-step to two-step crystal nucleation that occurs when the supersaturation ratio, , is increased far into the metastable zone for phase separation.
single-step process that takes place through the attachment of monomers from solution to a cluster with an internal structure matching the bulk crystal.Within the capillary approximation, the surface tension of the clusters of a new phase is independent of cluster size and also matches the bulk; we should expect a sharp solid-solution interface and the same crystal faceting observed at equilibrium and ignore any curvature effects on the surface tension for small clusters. 53In terms of kinetics, we assume an abundance of monomers in the solution phase and that the outof-equilibrium growth of nuclei occurs at a steady state.
It is highly unlikely that any single-step, homogeneous crystal nucleation occurs according to the above mechanism.The capillary approximation is particularly problematic for crystal nucleation because the smallest clusters are unlikely to display an interface structure with surface tension that matches highly faceted bulk crystals with infinitely large planar surfaces.Furthermore, it may not be the case that the density and structure in the emerging crystal are homogeneous throughout.The CNT framework also fails to account for the role that growth units beyond monomers might play in the formation of crystal nuclei.Some of the effects described above can be accounted for in frameworks which we label as extended classical nucleation.5][56] Other theoretical and simulation studies have identified that the structure, particularly the density, of the smallest nuclei differs from the bulk and stated that this should be accounted for in extensions of CNT. 57Formulations based on classical density functional theory provide corrections to the CNT nucleation rate derived from the excess system free energy that accounts for varying cluster density. 58iven the description of two-step nucleation above, it may be perceived that this type of phase separation mechanism is inconsistent with classical nucleation.However, theoretical studies have demonstrated that two-step nucleation can be described using classical concepts adopting a composite cluster model. 48,59Changes to the relative supersaturation of the system with respect to a dense liquid and crystal phase lead to changes in the pathways to crystals from solutions 48 : a common observation in experimental studies.Some nucleation theories predict mechanisms that are clearly different from those described by CNT and its extensions and are thus termed nonclassical.Because this classification encompasses a large family of different nucleation frameworks, it is usually not a useful or informative description.][62][63][64] A paradigmatic example of nonclassical nucleation worth mentioning is the prenucleation cluster pathway.The prenucleation cluster (PNC) pathway was first proposed for CaCO 3 65 but has since been attributed to phase separation in a diverse range of systems. 66The PNC pathway suggests that the parent solution phase is comprised of hydrated solute clusters in pseudo-equilibrium with solvated monomers.The population of PNC sizes is determined by the equilibrium constant (K) for the reaction monomer , which is assumed to be constant and independent of the value of x.PNCs are highly dynamic, evolving their structure and topology over very short timescales (typically picoseconds). 67The nucleation step in the PNC pathway involves changes to the order and dynamics of monomers within PNC assemblies that renders them a new thermodynamic phase and gives rise to an interface with the solution. 68This rate-determining step typically produces a dense liquid phase that contains a large amount of solvent.Subsequent transformations of this liquid are necessary to produce crystals, which may include the further dehydration of dense liquids to form amorphous solid-like phases.

| DYNAMIC SIMULATIONS OF CRYSTALLIZATION FROM SOLUTION: TIME-AND LENGTH-SCALE CHALLENGES
Obtaining information on molecular-scale crystallization events, particularly nucleation, requires a high degree of time and space resolution.This poses significant challenges for in situ experiments, often leading to only speculative interpretations of molecular mechanisms.In contrast, modeling techniques based on MD simulations inherently provide insight into the time evolution of systems with atomistic detail, 69 making them a powerful tool for understanding complex processes such as nucleation.However, the small system sizes and limited simulations times available to the simulator mean that advanced simulation methods must often be used to simulate nucleation.
In MD, Newton's equations of motion are adopted to evolve atom positions in time. 70Here, chemical bonds are typically approximated as classical "springs" whose displacement from an equilibrium bond distance is modeled using Hooke's law.Similar simple functions can be used to approximate bond and dihedral angle rotation to capture the forces associated with intramolecular degrees of freedom.Intermolecular interactions can be approximated using simple, classical interpretations of Van der Waals forces and Coulombic forces for point-charged atoms.Importantly, all molecular species are assumed to be in their electronic ground state.As such, MD adopting classical, semi-empirical force fields are not well-suited to simulate chemical reactions and dynamical changes to chemical speciation, though reactive force fields 71 and the advent of machine learning methods to force field design 72,73 make this a possibility.Particularly in the case of ionic systems, polarizability can be simulated using a relatively cheap treatment of charge displacement from atomic nuclei, for example, using springs.By assigning a velocity (sampled from a Maxwell-Boltzmann distribution at a defined temperature) to atoms at the beginning of a simulation, the Hamiltonian dynamics of the system can be propagated according to a fixed temperature and the force field. 70,74This time integration is performed iteratively using a small time step, typically on the order of 1 fs, to capture the fastest atomic displacements in the system, usually molecular bond vibration.
The choice of force field can have important consequences for simulation observations.In terms of simulating crystallization, the force field should reproduce the structure, density and stability of the crystal phase as a minimum requirement.Furthermore, the solubility of the crystal phase should be reasonably close to that determined experimentally if a comparison to experiments is intended.The properties of the solution should also be modeled accurately.For example, the free energy of solvation of solutes, their activity in solution as a function of concentration and their mobilities in the solvent are all important properties necessary to accurately capture the thermodynamics and kinetics of crystal nucleation.
Assuming a suitable force field is available (see Section 6), simulating crystallization from multicomponent liquids using MD typically requires addressing two main system-specific limitations: timescale and finite size effects.Despite the high space and time resolution capability of MD, which makes it suitable for understanding molecular-scale processes, crystal nucleation in microscopic MD volumes can be very slow to realize.Therefore, many novel simulation methods have been developed to efficiently overcome one or both of these limitations to modeling crystallization.

| Timescale limitations
Simulating crystallization using MD, especially crystal nucleation, presents a significant challenge due to the infrequency of many of the atomic and molecular scale elementary steps required for these processes to occur.For instance, depending upon the thermodynamic state of the parent phase, crystal nucleation can occur on time scales typically orders of magnitude larger than those accessible to brute-force simulations. 75,76The separation of these timescales has made stimulating crystallization an ideal playground for the development of enhanced sampling methods based on MD.Thanks to enhanced sampling, significant early progress was made to analyze the early stages of crystallization in simple systems of uniform particles 47,[77][78][79][80] and toward the in-depth investigation of nucleation in monocomponent molecular systems, such as pure water. 81In recent years, an evolution toward systems with increased complexity and practical relevance has begun, enabling for example, the study of solute precipitate nucleation.5][86][87][88][89] Attempts to simulate organic solids nucleating from solution have been successful in highlighting the structural features of early-stage crystallization precursors 90,91 and to extract qualitative information on nucleation mechanisms. 92,931.1| Origin of the timescale separation in nucleation: Nucleation free energy barriers and CNT CNT provides a quasi-mechanistic description of the nucleation process, connecting crystallization equilibrium thermodynamics and nucleation kinetics. 53CNT foundations were laid down by Gibbs 150 years ago to describe the formation of liquid droplets from saturated vapors, and the theory was developed over the following century, now often being invoked to explain the emergence of ordered materials from liquids.In CNT, there are two contributions to the free energy of a metastable system: a volume and a surface free energy both associated with the new, emerging phase embedded in an out-of-equilibrium parent phase.Forming a stable phase is thermodynamically favorable, but the resulting interface, which delimits the new phase from the parent one, carries an energy penalty.While the free energy gain associated with the formation of a stable phase scales linearly with the volume of a nucleus, the free energy cost of forming an interface scales with its surface area.This simple but general argument provides a rationale for the barrier to nucleation and a definition for the transition state associated with the nucleation process. 27,94he size of the nucleus was originally described using a linear descriptor, that is, its radius r.However, the number of constituent monomers belonging to the nucleus, n, provides a variable which makes fewer assumptions about the nucleus shape.In this context, the volume and surface free energy contributions lead to the following expression for the nucleation free energy: where Δμ ℓ!xtal ¼ μ ℓ À μ xtal is the difference in chemical potential between the metastable liquid and the stable crystal phase and is strictly positive in conditions where nucleation is thermodynamically favorable and the system is supersaturated.The surface term scales as n 2=3 , where σ 0 is the product of a shape factor, area per monomer adsorption site and the surface tension, σ.A typical free energy profile for nucleation in the CNT framework is shown by the red curve in Figure 2a.By solving the derivative of the equation above with respect to n, we find that the free energy barrier associated with the nucleation process ΔF Ã is where the critical size of the crystal nucleus is and CNT provides analytical solutions for the minimum work required to form a crystal nucleus.The addition of monomers to the nucleus beyond n Ã reduces the free energy of the system.CNT assumes that a single energy barrier separates the parent liquid from a bulk crystal in equilibrium with a solution.It also assumes that the surface tension and the shape factor comprising σ 0 are unchanging as a function of n.This is the so-called capillary approximation that likely fails for the smallest crystals whose shape and faceting are expected to deviate from that of the bulk crystal. 41][56] The crystal embryo growth is assumed to occur by the transfer of monomers from the solution to the crystal. 95This is justified by considering the probability of finding clusters of size n: and n Ã is relatively small; it is reasonable to expect the transfer of growth units from the solution to the crystal to be dominated by solute monomers.The rate, J, of formation of clusters of size n per unit volume of the solution and per unit time takes an Arrhenius-type form 96,97 : where ρ is the volume density of solute monomers and f þ is the rate of attachment of monomers to the nucleus.Z is the Zeldovich factor for homogeneous nucleation, which accounts for the low probability of observing a large population of supercritical clusters that progress to bulk crystals (and is determined by the shape of the free energy barrier in n). 53,98Due to the exponential term in Equation ( 4), small changes to ΔF n Ã ð Þ, as determined by Δμ ℓ!xtal and σ 0 , can result in orders of magnitude changes to crystal nucleation rates.Following units for the rate as m À3 s À1 , the characteristic time for a single nucleation event scales according to V À1 , the reciprocal volume of the solution.As described above, computational costs limit the total volume (number of atoms) that can be simulated using standard MD.With this in mind, consider a hypothetical aqueous solution system undergoing relatively fast nucleation with J ¼ 10 20 m À3 s À1 .In an MD simulation of this solution containing around 10 5 water molecules, we can expect the mean simulation time required to observe one nucleation event to be roughly 1 h.This is far beyond the capabilities of MD, which typically achieves simulation times up to 10 À6 s on a powerful CPU/GPU.From this example, it is easy to appreciate why enhanced sampling simulations based on MD are often essential to study phase separation in some solutions.

| Finite-size dependence of the crystallization driving force
Simulations of crystallization from solution are typically carried out in the canonical or isothermal-isobaric ensembles, where the total number of atoms/molecules is constant.When dealing with out-of-equilibrium, multicomponent liquid phases undergoing a phase transition, this constraint introduces a coupling between the number of solute monomers available to a growing crystal nucleus and the time-dependent crystallization driving force, Δμ ℓ!xtal . 32,75,92,99,100Indeed, nucleation can be completely inhibited in a microscopic, closed system, if the transfer of monomers from the liquid to a critical cluster of the new phase renders the solution undersaturated. 75,92,101In a dense system, such as a liquid solution, this coupling cannot be efficiently removed by simulating in the grand canonical ensemble (where the solute is replenished from an artificial, external reservoir to maintain a constant solution chemical potential), due to a low probability for insertion of solute in the liquid phase.Instead, molecular simulations using a constant number of molecules require either the application of theoretical corrections to account for the change in Δμ ℓ!xtal or the development of specialized methods to mimic open boundary conditions, [102][103][104][105][106] unless sufficiently large systems can be simulated to minimize the finite-size effects.

| Nucleation free energy in small systems
A rationale for the effect of system size on the phase behavior of metastable liquids is gained by developing a model for crystal nucleation in confined volumes analogous to the modified liquid droplet (MLD) model developed by Reguera et al. 99,107 to describe depletion effects on the thermodynamics of nucleation of liquid droplets 86,92 Consider a two-component supersaturated solution, ℓ.The chemical potential of solute i in solution can be written as, where μ 0 is a reference chemical potential and a i is the activity of i, which is approximately equal to the mole fraction, x i , assuming close-to ideal solution behavior.When a crystal forms, the chemical potential of this phase equals the chemical potential of the solution at equilibrium, such that, and asterisks indicate the activity and mole fraction of the solute under coexistence conditions.The transfer of monomers from the solution to the crystal during nucleation depletes the surrounding solution.However, this effect is negligible in a macroscopic system, as the abundance of monomers in the bulk liquid quickly replenishes the solution surrounding the nucleus (assuming that crystallization is not diffusion limited).In a simulation, however, where the total number of monomers is fixed (N), the depletion changes the driving force for crystallization and must be accounted for in an additional term to Equation (1): Depending on the volume of the simulated system, and the supersaturation of the mother phase, the deviation between a macroscopic and a finite-size nucleation free energy profile can be responsible for significant changes to the outcome of nucleation simulations.In supersaturated solutions, confinement leads to a minimum in the free energy profile shown in Figure 2a, the depth of which is determined by the initial concentration and number of monomers in solution.Furthermore, the height and position of the barrier is perturbed compared to the macroscopic case.If association of monomers leads to the solution becoming undersaturated below the critical size of the nucleus in the macroscopic limit, then nucleation can be inhibited altogether.
One could take advantage of the limitations imposed by finite size by performing simulations initiated at different N and V and analyzing the steady-state properties of the system obtained from simulations to evaluate ΔF n ð Þ.For example, Li et al. 108 performed a series of unbiased MD simulations where they varied the concentration and total number of coarse-grained, intrinsically disordered peptides in a continuum solvent.Starting from a homogeneous phase, spontaneous separation occurred when simulations were sufficiently large and concentrated, resulting in dense liquid peptide droplets in equilibrium with lean solutions.Fitting the steady-state simulation data to a model analogous to Equation ( 7) allowed the authors to evaluate x i,Ã , σ and n Ã over a range of saturation levels and identify the conditions where nucleation is completely inhibited by the system size. 92,93ractically, computational challenges mean this approach has yet to find applications to study crystal nucleation.This is because the spontaneous decomposition of a highly supersaturated solution is unlikely to lead to crystals due to the slow decay times for monomer relaxation to a lattice structure in a dense amorphous precipitate.Even if sampling this monomer ordering were feasible over MD timescales, many more crystal geometries could result from a rapid crystallization step, making precise estimates of σ difficult.Based on the local density, alternative formulations of CNT have also been constructed, 109 and the finite size effects can be included here to account for the changing thermodynamic driving force. 101

| Simulating condensed matter systems with pseudo-open boundaries
A limited number of strategies have been developed to avoid the effects of solute depletion that shift the thermodynamic driving force for steady-state crystallization in simulations.The constant chemical potential MD method (CμMD) 102,104,110 employs a closed system that is segregated into a transition region (TR), housing the process of interest; a control region (CR), representing a bulk fluid with fixed composition; and an internal reservoir that supplies the CR with solute monomers as they are removed from the liquid phase toward the growing crystal in the TR.In doing so, steady-state crystal growth is maintained for relatively long simulation times, as was shown in the case of, for example, urea, 102 isoniazid 111 and naphthalene, 112 which demonstrate crystal facet and solvent dependent growth rates.In addition, crystal nucleation can be studied using a spherical variant of CμMD that was applied to simulate NaCl crystallization. 104ombining multiple, carefully prepared closed system simulations (representing steps on a nucleation reaction coordinate) to maintain the solution concentration as solute molecules are transferred to a crystal nucleus, outof-equilibrium sampling in the so-called Osmotic ensemble can be performed to avoid solute depletion effects. 105This was successfully applied to understand the nucleation and polymorph selection of sulfamerazine. 106Unlike CμMD, the reaction coordinate for crystallization must be known a priori here to set up the initial configurations, and dynamical information cannot be obtained using this approach.
Another promising simulation strategy to avoid solute depletion is the adaptive resolution scheme. 113Here, a closed system contains a high-resolution region of interest coupled to a low-resolution, coarse-grained representation of solute/solvent.In the case of nucleation, for example, the crystal nucleus and its immediate environment can be modeled with atomic detail.As the nucleus grows, a smooth transformation of the model for solute and solvent molecules occurs by extending the region of the system modeled with high resolution.There are no known examples of applying such approaches to study crystallization.Both the coarse-grained model and the method by which the molecule representations are transformed must be prepared/performed carefully to ensure the properties of the system are maintained.

| MODELING APPROACHES TO SIMULATE NUCLEATION FROM SOLUTION
As highlighted in Sections 2 and 3, modeling nucleation from solution introduces unique theoretical and practical challenges.As such, a range of different approaches have been applied to this problem, populating a variety of methods that span epistemic interpretations of the role of simulations.On the one hand, there is the application of molecular simulation methods to estimate physical parameters appearing within independently established theoretical frameworks.On the other hand, molecular simulations can be utilized as computational experiments, yielding a direct observation of the fundamental steps underpinning crystal nucleation from solution.These types of simulations contribute to the assessment of existing theoretical frameworks and, if necessary, to the development of new ones.The space between these two simulation extremes is populated by a plethora of different approaches that vary not just in their technical implementation but also in the degree to which they rely on reference theoretical frameworks to yield estimates of nucleation kinetics and mechanisms.In Figure 3, we provide a graphical representation of a variety of such methods in relation to their reliance on CNT and its underpinning assumptions.At one extreme, if computational resources were abundant, then one could perform many, large-scale MD simulations for sufficiently long enough times to observe crystal nucleation and growth, and, following validation of an appropriate reaction coordinate, extract kinetics for nucleation using Poisson statistics (see right of Figure 3).This approach requires no a priori assessment of the mechanism and rate equation for nucleation.Unfortunately, the computational cost here means that it is unlikely to succeed for most cases.At the other extreme, if one assumes a mechanism following CNT, equilibrium MD simulations or experimental data can be leveraged to plug into the theory master equation (see left of Figure 3).This approach can be very cheap; however, estimates of crystallization rates here are likely to be erroneous (given the shortcomings of CNT and observations from experiments discussed in Section 2) and no insight into the mechanism is gained, as well as this approach being largely unsatisfying for the simulator.The simulation methods we discuss below fall between these extremes and were either developed to simulate nucleation directly or to simulate rare events more generally.
A somewhat arbitrary classification of nucleation from solution guides the structure of the core sections in our review.In the following, we summarize the basics of different simulation approaches by grouping them into two main categories: theory-based (Section 4.1.1)and exploration (Section 4.2) simulation methods.The latter rely on the vast literature covering rare event sampling methods.We report on both biased and unbiased simulation methods, prioritizing those that have been applied to study crystal nucleation from multicomponent solutions.Finally, we propose a global overview of the type of insight available from nucleation simulations by reporting results that have accumulated in the last decade on the homogeneous nucleation of NaCl in an aqueous solution.For this system, we have examples of many, if not all, possible implementations of nucleation simulation strategies.This unicum in the literature provides an opportunity to compare different approaches and, at the same time, offers an overview of the field.Numerous studies of NaCl crystal nucleation using different simulation methods facilitate an assessment of the suitability of these approaches to investigate crystal nucleation in silico more generally.
F I G U R E 3 A variety of molecular simulation techniques that can be applied to study nucleation from solution ranges from completely theory-based (left side) to exploration-driven computational experiments (right side).The insight available and its associated computational cost vary significantly across this range of methods, highlighting how an a-priori formulation of the research question to be addressed by simulation is essential to guide the selection of an appropriate simulation method.
4.1 | Informing theory with molecular simulations

| The seeding method
In the seeding method, popularized by the Espinosa, Vega, Valeriani, Sanz, 114,115 Quigley, 83 and Peters 89,116 groups, molecular dynamics simulations are used to inform nucleation rate expressions based on CNT.By construction, the seeding method relies on the a priori acceptance of a quasi-classical nucleation pathway and, within this context, enables the calculation of nucleation rates.Importantly, as extensively discussed by Zimmerman et al., 116 the seeding method is applicable when the size of the nucleus is the reaction coordinate for nucleation, and the capillary approximation can be safely applied (see Section 2).Because analysis of seeded simulations is based on CNT, the approach is only applicable to systems that nucleate classically (see Figure 1).
The key expression in the seeding method is the CNT nucleation rate, that is Equation (4).][116] To solve the rate equations, the parameters that should be estimated at a given T are the solute density ρ, the thermodynamic driving force Δμ ℓ!xtal , the critical nucleus size n Ã and the attachment frequency f þ .The seeding method leverages unbiased MD simulations to estimate n Ã and f þ by averaging the dynamics of growth or dissolution of nuclei that are prepared with initial size n 0 and equilibrated in a solution with composition ρ and temperature T. The rate is then evaluated according to an estimate of Δμ ℓ!xtal , which has to be independently obtained at the composition of interest.Along with others discussed below, the method has only been applied to study nucleation from homogeneous solutions thus far.As well as requiring a supply of sufficient monomers to a post-critical seed, the initial seed configuration should be mechanically relaxed which is more challenging in the presence of surfaces.Algorithmic frameworks were designed to combat this latter challenge in simulations of ice nucleation and these tools may also prove useful to study nucleation from solutions. 117he critical nucleus size n Ã , for a specific value of ρ and T is obtained by performing ensembles of simulations varying n 0 and, by computing for each ensemble of trajectories initialized at the same n 0 , the ensemble average of the initial drift velocity _ n 0 ð Þ h i n 0 .The initial size n 0 associated with _ n 0 ð Þ h i n 0 ¼ 0 is by definition the critical nucleus size n Ã .As such, seeding simulations can be performed using any MD engine that can handle the force field.Typically, for a solute concentration, one performs at least five simulations at different values of n 0 (where each seed is carefully relaxed in the solution), each of which involves running a handful of independent simulations to obtain the mean trajectory that indicates the relative stability of the cluster.Seeding methods can, therefore, be relatively cheap and easy to perform in systems where the attachment of monomers from the solution to crystal occurs readily over simulation timescales.
The attachment frequency f þ can be obtained by considering that the dynamics of the nuclei obey the overdamped Langevin dynamics; Zimmerman et al. 86,116 have shown that the attachment frequency is limited by the desolvation process.Under this condition where k s is a second-order rate constant and σ s is the surface concentration of attachment sites.A more simulationdriven approach to estimate the attachment frequency is instead proposed by Auer and Frenkel, as discussed by Espinosa et al. 118 : These two approaches have been shown to lead to the same order-of-magnitude estimates of the attachment frequency f þ for NaCl nucleation from aqueous solution, 115 a case study that will be further explored and discussed in Section 5.
A thorough discussion on the estimate of the attachment frequency is reported in Lifanov et al. 83 It should be noted that, in the literature, we can find examples of applications where seeding-inspired approaches are used to compute only the prefactor of Equation ( 4), while enhanced sampling methods are used to independently compute ΔF Ã . 104otable applications of the seeding method to the investigation of nucleation from solutions include, for example, the study of the nucleation of methane hydrates, 119 and the study of urea nucleation from aqueous solution. 120

| Prenucleation species
Many simulation studies seek to gain an understanding of nucleation without directly simulating the process.These types of simulations are usually the cheapest as the steady-state solution behavior can be achieved readily in standard MD simulations.Often, though, enhanced sampling techniques (see Section 4.2) are also used to determine the thermodynamic stability of associated species.In complex solutions, simulations have provided information on the species potentially involved in nucleation; careful analysis of these species and their assembly can aid the classification of nucleation pathways. 49,67,121emichelis et al. 122 performed metadynamics simulations (described in Section 4.2) to confirm the relative stability of calcium phosphate complexes as the first associates to form in solution and thought to be directly involved in mineral nucleation from experiments. 123Simulations have also shed light on structural building units that act as precursors in the nucleation of metal-organic frameworks [124][125][126] and the complexes that assemble to form inorganic functional materials.
It was hypothesized that the structural motifs of API dimers in solution encode the polymorphic outcomes of crystal nucleation.8][129][130][131] In simple 2D models of flexible chiral molecules, Carpenter and Grünwald 132 recently demonstrated how bulk crystal structures are related to the organization of building blocks prior to nucleation and the importance of kinetics in predicting polymorphism.
Investigating prenucleation species is particularly useful to identify systems that may undergo phase separation following the PNC pathway and other nonclassical crystallization routes.4][135][136][137][138][139] These studies have demonstrated that entropy drives the formation of PNCs 135 and questioned the relationship between PNCs and microscopic precursors to dense liquids. 136,139Generally, it was observed that organic additives stabilize liquid-like assemblies that form prior to crystallization. 133,137,138redicting the phase behavior in systems that follow the PNC pathway to phase separation requires evaluation of the equilibrium constant for monomer association, as discussed in Section 2. 68 Enhanced sampling simulations, particularly umbrella sampling and metadynamics, 134,138,140 have been successfully applied to evaluate these constants, which corroborate experimental measurements. 135Both biased and unbiased MD simulations were informative in predicting the structure and dynamic properties of the PNCs and determining the thermodynamic driving forces for their formation. 67,137,139In this regard, the application of simulations has been critical to understanding and evolving theories for the PNC pathway.

| Molecular simulations as computational experiments
As discussed in Section 4.1.1,the seeding method provides information on the nucleation kinetics, implying a nucleation process that closely follows the nucleation pathway postulated in CNT.By construction, seeding methods cannot answer research questions pertaining to the nucleation mechanism itself.In order to discover and investigate mechanisms deviating from the one postulated in CNT and its extensions, unseeded nucleation simulations are necessary.In unseeded nucleation simulations, the assembly of crystalline nuclei is explicitly sampled, starting from a solution where a critical nucleus is absent.As such, the nucleation mechanism is not a priori defined and emerges from the collective evolution of the system, thus allowing for an open-ended exploration of the nucleation process.This can be useful for studying systems that exhibit complex behavior involving intermediates along the nucleation pathway or that can yield different crystal structures upon nucleation.
In order to sample nucleation events in unseeded simulations, enhanced sampling methods are key.Under conditions of supersaturation of interest for practical applications, spontaneous fluctuations across the nucleationfree energy barriers are too rare to be observed over timescales accessible with standard MD simulations (see Section 3.1).Enhanced sampling methods can thus be used to overcome this limitation.Broadly speaking, enhanced sampling methods used to investigate nucleation from multicomponent liquid phases can be classified depending on whether they introduce a bias potential as a perturbation of the system's Hamiltonian-such as metadynamics and umbrella sampling-or are based on efficiently sampling the space of reactive paths using techniques such as Transition Path, Transition Interface, and Forward Flux Sampling, or the construction of Markov State Models from unbiased simulation trajectories.Biased enhanced sampling methods typically aim to estimate the free energy barrier associated with the nucleation and to discover nucleation pathways.Nucleation rates are usually obtained by complementing evaluation of the free energy barrier with estimates of the rate prefactor, often carried out using approaches similar to those adopted in the seeding method.A direct calculation of nucleation rates here is only practical in simple cases. 75

| Biased enhanced sampling approaches
Biased enhanced sampling methods in the context of crystal nucleation are typically deployed to achieve two aims: the calculation of the free energy barrier associated with the nucleation process and the enhanced exploration of the nucleation mechanism.Typically, the former objective can be achieved by either static or history-independent biasing strategies while the second objective is pursued by the deployment of adaptive, history-dependent biasing methods.In the following, we briefly recap the methodological bases of two biased enhanced sampling methods representative of the static and adaptive categories: Umbrella sampling (US) and metadynamics (MetaD).Here we do not report on sampling methods based on constrained dynamics such as the string method used by Santiso et al. to model crystallization both from the melt 141 and from solution, 105 and we refer the interested reader to the original publications for an overview of this method, related to US and MetaD.Biased enhanced sampling methods depend crucially on the choice of low-dimensional descriptors of the system configuration-that is, the collective variables (CVs)-that in biased sampling are used to define the bias potential. 46,142We briefly discuss this point at the end of this section and for a comprehensive overview, we refer the interested reader to reviews on this topic from Giberti et al. 87 and Neha et al. 143 These types of simulations are typically expensive, especially if multiple CVs are biased.Very many (US) and/or very long (MetaD) simulations may be necessary to obtain convergent thermodynamics and kinetics.However, the computational cost is much lower than if one were to observe crystal nucleation spontaneously in such systems.US can be performed in the most commonly available MD engines that allow for implementation of harmonic restraints; though the CVs typically used to simulate nucleation are unlikely to be available in standard MD codes.A useful and noteworthy plugin is the PLUMED software 144 which interfaces with most MD engines and offers a wide range of CVs, including ones that are useful to study crystal nucleation and discussed in this review.PLUMED also allows the practitioner to perform US and MetaD, as well as other types of biased enhanced sampling, in favorite MD engine that has been patched with PLUMED.

Umbrella sampling
6][147] The method involves performing a number of independent simulations, or windows, where a harmonic bias potential, defined as a function of a CV s as V i ¼ k i s À s i ð Þ 2 is added to the Hamiltonian of the system.In the ith window, the bias potential is designed to sample configurations that, in s, are projected in the vicinity of s i .Performing simulations for s i values describing a pathway between reactants and products allows collecting configurations distributed between a supersaturated solution and a crystalline nucleus.Within each window, the biased probability density can then be reweighted, 145,148 and a global free energy profile is then obtained by employing either the weighted histogram analysis (WHAM) 146,149 the multiple Bennett Acceptance Ratio (mBAR) method, 150 or umbrella integration (UI). 151US is routinely used to compute free energy surfaces associated with activated processes and its most common application in nucleation studies is the calculation of the free energy barrier to nucleation, that is ΔF Ã , without resorting to a theoretical formulation of the barrier dependence on thermodynamic parameters such as the solubility and surface tension. 152For the calculation of nucleation rates, US calculations are complemented by estimates of the nucleation rate prefactor.This can be obtained by using CNT-inspired expressions 118 following an approach similar to seeding simulations or, more generally, by drawing from the Bennett-Chandler formulation of the kinetic prefactor associated with a rare-event transition. 86,153tadynamics Metadynamics (MetaD) is a molecular simulation technique to study the thermodynamics and mechanisms of rare events. 154,155It involves introducing a time-dependent bias potential to the system being simulated, which acts to push the system out of local energy minima and explore a wider range of possible configurations.Poorly explored regions of configuration space, that might not be easily accessed using traditional molecular dynamics simulations, can be visited using this approach.MetaD is one among many methods in which sampling is enhanced by adaptively perturbing the original Hamiltonian of the system by introducing a bias potential, a seminal idea historically introduced with umbrella sampling. 145,154,156,157In MetaD such a potential is adaptively constructed as a sum of Gaussian kernels defined in a low-dimensional space of collective variables (CVs), usually indicated as s.CVs are formulated as continuous and differentiable functions of the microscopic coordinates of a system. 87,155,158,159In recent years, several adaptations of MetaD have been proposed, the most relevant being Well-Tempered metadynamics (WTmetaD) introduced by Barducci et al. 155 In WTmetaD, the external repulsive potential is iteratively updated as: where n À 1 and n refer to consecutive iterations of bias deposition, s n defines the position of the system in CV space s at iteration n, V nÀ1 is the total bias potential at iteration n À 1, k B is the Boltzmann constant and ΔT is a parameter akin to a temperature.A key feature of the WTmetaD algorithm is the fact that the scaling factor exp 1=n leading to a convergent behavior.WTmetaD convergence was demonstrated initially for infinitesimally narrow kernel functions and more recently for any G s, s n ð Þ.By changing the parameter ΔT, a controlled enhancement of the fluctuations in s can be achieved, leading to the asymptotic convergence of the bias in the long-time limit: is the free energy in the set collective variables s, and c t ð Þ is a time-dependent constant. 160 the context of the study of nucleation processes from solution, metadynamics was applied to explore the configurational landscapes associated with the nucleation of small organic molecules from liquid solutions, [91][92][93]161,162 the nucleation of salts from aqueous solution, 49,104,163 the nucleation of methane clathrates, 164 and the assembly of perovskites. 165 Figre 4 highlights some of these examples.In all of these cases, metadynamics enables the calculation of continuous reactive trajectories for crystal nucleation and often enables estimates of the free energy landscape associated with such pathways.

| Unbiased enhanced sampling approaches
In this review, we indicate with the adjective unbiased those rare event sampling methods that are not based on the perturbation of a system's Hamiltonian by introducing any biasing potential or any artificial force, yet achieve an enhancement of the sampling of activated events by efficiently sampling trajectories obtained using standard MD.Such techniques are based on efficiently sampling the Transition Path Ensemble (TPE) and have been spearheaded by Transition Path Sampling (TPS).After briefly introducing TPS, hereafter, we focus on reporting only techniques that have recently been used to model nucleation from multicomponent liquids, such as Transition Interface Sampling 166,167 and Forward Flux Sampling 153,168 that was recently applied to simulate NaCl nucleation from aqueous solution at moderate supersaturations 169 (see Section 5).

Transition path and transition interface sampling
Transition path sampling (TPS) leverages a Monte Carlo algorithm to sample the TPE starting from a single reactive trajectory that connects reactants and products.In the context of homogeneous nucleation, this trajectory connects a supersaturated solution to a solution containing a crystal particle.This initial reactive trajectory is often generated using biased sampling techniques.New trajectories are then sampled by implementing a shooting algorithm.While multiple shooting algorithms exist, a typical approach consists of perturbing a configuration sampled by the initial reactive trajectory that is, by slightly modifying the momenta, thus generating a new trajectory via backward and forward propagation in time.If the new trajectory, proceeding through the set of configurations x in n steps, connects reactants A ð Þ and products B ð Þ, it is accepted with a path weight that depends on the equilibrium phase space probability density of the initial point in the trajectory ρ x 0 ð Þ as: where p x i !x iþ1 ð Þis the probability to transition from configuration x i to configuration x iþ1 .The calculation of the rate constant associated with the AB transition is based on the estimate of the correlation function , where h i is a characteristic function that is equal to one in state i, and null everywhere else.As discussed in detail in references 153, 170, 171 the calculation of the kinetic constant requires the generation of trajectories targeting intermediate states between A and B, typically generated using an order parameter, or CV.It should be noted that, while the sampling of trajectories is independent from the choice of CVs, the rate calculation does depend on the CV choice.
Transition interface sampling (TIS) was introduced to improve the efficiency of the rate calculation, associated with TPS.In TIS, configuration space is sectioned into non-intersecting interfaces based on a CV (usually indicated with λ in the TPS/TIS literature).The order parameter lambda should enable a mutually exclusive partitioning of the configuration space between a reactant basin, where λ < λ A ¼ λ 0 and a product basin where λ > λ B ¼ λ n .The reaction rate constant k AB is then obtained as 167 : where Φ A,0 is the steady state flux of trajectories leaving the reactant state, which can be easily evaluated by a bruteforce MD simulation of the system in the reactant state; h A is a function that is equal to unity if a trajectory was more recently in the reactant state than in the product state; while Þ is the probability that a trajectory crosses the product interface λ n , when starting from reactant interface λ 0 .In order to improve the efficiency of the computational evaluation of P λ n jλ 0 ð Þ, further developments of the TIS algorithm, such as replica exchange TIS (RETIS 171,172 ), were proposed, introducing exchange moves between interface ensembles in order to enhance the ergodicity of the sampling.As for TPS, in TIS the CV λ is used to conveniently partition configuration space, and does not affect the sampling of reactive trajectories.

Forward flux sampling
Forward flux sampling (FFS) is a computational method used to study rare events in complex systems.Unlike TPS and TIS, FFS does not require the system to be at equilibrium, and can thus be applied to out-of-equilibrium processes.Similarly to TIS, FFS is based on sampling transitions between non-intersecting interfaces defined in a low-dimensional order parameter space describing the transition from a reactant state, A, to a product state, B. Analogously to TIS, the initial interface λ A marks the boundary in CV space between the reactants and all other configurations, and λ B indicates the boundary between the products and all other configurations.The pathway from A to B is described by crossing a series of intermediate interfaces λ : λ i , …, λ nÀ1 , with λ iþ1 > λ iþ1 for every value of i.
FFS uses the same expression proposed by TIS for the calculation of the transition rate between A and B, namely Equation ( 12).However, it differs in the computational approach adopted for the calculation of the term P λ iþ1 jλ i ð Þ, that provides the attribute forward to the method's name.In FFS the probability of crossing interface λ iþ1 starting from interface λ i is obtained from the forward-only integration of the system's dynamics.This term is obtained as the fraction of trial runs initiated in λ i that reach λ iþ1 . 79Implementations of the FFS algorithm differ in the specifics of the algorithm adopted to generate configurations at interfaces and thus in the details associated with the calculation of P λ iþ1 jλ i ð Þ.Additional information on FFS can be found in original publications and a number of reviews describing specific of the method.Particularly important in the context of nucleation, where the size of the largest nucleus is typically a good choice of order parameter (see Section 4.2.3), is the fact that jumpy order parameters require a specialized treatment in order to consistently yield estimates P λ iþ1 jλ i ð Þ as discussed in detail by Haji-Akbari. 173Furthermore, Hall et al. 168 provide a practical guide and comparison between the RETIS and FFS methods.
The need to spawn many MD trajectories (typically, more than 100 crossings are required to achieve good statistical accuracy in the probabilities) along many points (the density of interfaces must be high if the free energy barrier to nucleation is large) in a reaction coordinate make FFS particularly expensive.As FFS requires unbiased MD, it can be performed with all MD packages and using scripts to automate the spawning and analysis of trajectories.Of course, the CV must be implemented in order to use the method.In this regard, SSAGES 174 is software that interfaces with popular MD engines and facilitates FFS simulations with several FFS protocols implemented.Though several CVs are available, they are not typically used to study nucleation; however, a guide is provided in the software documentation on how to add CVs to the code.

| Collective variables
Crystal nucleation from solution is a collective process of assembly that involves, by definition, an ensemble of growth units (molecules, ions, atoms, particles, etc.) that are inherently equivalent and that come together to form a nucleus of a new phase characterized by a well-defined structural arrangement.In order to describe and ultimately understand the salient features of the assembly process, it is necessary to develop low-dimensional descriptors of the system characterizing the transformation.In the theoretical and computational literature on phase transitions, such descriptors take the name of order parameters (OPs).In the context of enhanced sampling, OPs fall into the broader category of Collective Variables (CVs, often indicated as s r

!
).These are typically functions of the atomic coordinates that are used to define the bias potential added to the Hamiltonian in biased enhanced sampling or that mark the progress between reactants and products in unbiased enhanced sampling and MD.Very simple CVs to monitor the emergence of crystalline order include average monomer coordination number, system potential energy and local density, though the correct choice of CVs will depend upon the simulated system.In addition, these simple examples may not be able to distinguish between different crystal polymorphs, in which case some characterization of the local symmetry between coordinated monomers in crystal structures should be captured by the (combination of) CVs.
It is important that CVs approximate the reaction coordinate associated with the nucleation process and therefore distinguish important states along the reaction path, such as reactant, product and, ideally, configurations belonging to the transition state ensemble. 175Moreover, to be used in biased enhanced sampling simulations, CVs should be continuous and differentiable functions of the atomic coordinates.It was demonstrated by Peters et al. that to study singlestep nucleation mechanisms and to determine rates, CVs should characterize the size of the crystalline cluster in solution; indeed, the application of CVs to describe phase separation can be tested to ensure they are good ones to determine mechanisms and rates and to compare to reaction coordinates adopted in established nucleation theories. 86,175In two-step processes, a two-dimensional CV space representing the extent of the largest cluster and of the largest ordered domain in the nucleating phase have also emerged as good descriptors of the reaction coordinate, 46,49,92,93 which also lend themselves to a theoretical description of two-step nucleation. 48More recently, the application of Machine Learning methods and the data-driven identification of low-dimensional reaction coordinates for nucleation has emerged as a viable strategy to identify combinations of CVs that enable an effective, low-dimensional description of nucleation processes, 46,143,[176][177][178] and that allow for the application of biased enhanced sampling to drive polymorph-specific crystal nucleation. 162he definition of effective CVs to describe and enhance the sampling of complex nucleation processes in solution also hinges on our ability to define order parameters that can resolve well the atomic environments that are characteristic of specific crystalline structures.While this is routinely done for crystals constructed from atomic or (spherically uniform) single particle monomers-simple colloids being an example-using bond-orientational OPs such as the Steinhardt order parameters 87,143,179,180 it remains a challenge for molecular crystals.Approaches based on the calculation of generalized pair distribution functions 181 or on the calculation of properties of the distributions of order parameters [182][183][184] appear promising options but still require significant improvements in terms of computational efficiency, generalizability to molecular systems with a significant degree of conformational complexity 185 and with hundreds of putative polymorphs. 186,187All these aspects limit their current applicability to study nucleation from solution and represent one of the most limiting bottlenecks in the current applicability of systematic nucleation studies in molecular systems.

| COMPARING APPROACHES: NACL NUCLEATION FROM AQUEOUS SOLUTION
As anticipated in Section 4, here we report on the results obtained for the nucleation of NaCl in aqueous solution by different researchers over the previous decade.NaCl(aq) is arguably one of the simplest mineralizing solutions and represents one of the earliest case studies for crystallization, 88,188 instrumental for our understanding of crystal nucleation from solution and for assessing simulation methods applied to this problem.The examples we report are representative of the methods that we briefly introduced in the previous sections and show how different approaches can yield complementary information on NaCl nucleation kinetics and mechanisms from aqueous solutions spanning a wide range of concentrations.Many of the studies conducted on this system employ the same solute and solvent forcefield combination and can thus be quantitatively compared, such as in Figure 5.The force field of reference in the studies described in this section, unless otherwise stated, is the NaCl Joung-Cheatham (JC) 189 force field that makes use of the SPC/E water model. 190The force field is very simple, using Na þ and Cl À ions in water modeled using a three-point water model with a constrained geometry.This force field performs reasonably well to capture the solution properties, including ion solvation energies. 189This is perhaps surprising for such a simple model that offers a cheap solution to simulate NaCl(aq).It was demonstrated that scaling the charges on ions can improve the model of the solution and several schemes are available in order to do this; see references 191, 192.However, these improvements to modeling the solution phase do not always lead to improved thermodynamic and mechanical properties for the solid NaCl phase when compared to experimental measurements.Recent parameterization strategies reduce some of these errors and provide force fields that predict solubilities close to those determined experimentally. 193espite more accurate force fields being available, the JC/SPC/E force field is, arguably, the most characterized model to simulate NaCl(aq) and has been adopted in most studies of nucleation where both mechanisms and nucleation rates have been extracted from simulations.5][196][197][198][199][200] An accurate determination of the solubility enables the correct assessment of Δμ that is used to evaluate rates in the seeding method (see Section 4.1.1), 89,115,116and to consistently attribute a supersaturation level to simulations performed with theory-agnostic methods such as FFS, and Markov State Models (MSMs). 49,152,169,201The accepted solubility for JC NaCl in SPC/E water at 298.15 K and 1 bar is 3.7 m, which we label b sat . 202As discussed in detail by Zimmermann et al., 116

| Nucleation kinetics and mechanisms at moderate supersaturation with forward flux sampling
Employing Forward flux Sampling (see Section 4.2), Jiang et al. have investigated the nucleation of NaCl from brine at supersaturation ratios (S ¼ b=b sat ) ranging between 2.1 and 4.5. 169FFS allows evaluation of both the nucleation mechanism and nucleation kinetics at these conditions.Concerning the former, the authors observe that nuclei tend to assemble directly into an FCC-like rocksalt structure, independently of their size, and that the level of crystallinity of the nuclei has a strong influence on their lifetime and probability of growing.This suggests that the nucleation process at moderate supersaturations follows a mechanism that can be described by CNT.The nucleation kinetics obtained by FFS tend to underestimate experimental measurements, possibly due to an overestimation of the crystal/solution interfacial tension. 169Nevertheless, this dataset provides an important benchmark for studying nucleation from solution as rates here are computed independently from any theoretical interpretation of the self-assembly process.

| Nucleation kinetics within the remits of CNT
The quality of the dataset provided by Jiang et al. 169 comes with a significant computational cost.Hence, there is a strong incentive to test this result against less computationally intensive methods such as seeding (see Section 4.1.1).Two papers 115,116 have provided independent attempts at comparing nucleation kinetics of seeding with the FFS dataset.Zimmerman et al. 89,116 exploit the fact that the ion attachment frequency f þ is dominated by ion desolvation and adopt the theoretical expression of the nucleation rate prefactor reported in Equation ( 8).Moreover, in contrast with other seeding studies, 118 the authors adopt the Girshick-Chiu correction 203 ð Þ corresponds to the free energy associated to the formation of a monomer in a crystalline configuration) in the prefactor to the rate expression.Adopting the correct value of Δμ ℓ!xtal 116 allowed Zimmerman et al. to estimate nucleation rates in a range of supersaturations overlapping with the Jiang dataset.A similar simulation strategy was recently employed by Lamas et al. 115 who adopted the Auer and Frenkel expression of the attachment frequency f þ (see Equation 9) and did not explicitly introduce the Girshick-Chiu correction in their working expression for the calculation of the nucleation rates, yielding a set of results slightly less reliant on theoretical considerations than those of Zimmermann et al.
As shown in Figure 5, these two approaches yield substantial discrepancies in the estimate of the nucleation rates, which can only partly be attributed to subtle differences in the expressions adopted for the nucleation rate prefactor.The discrepancy appears to originate, instead, from the classification criteria used to estimate the number of crystalline particles in the evolving seeds.The classification problem is discussed at length by both Zimmerman et al. 116 and Lamas et al. 115 In particular, Zimmerman et al. show that nucleation rates are extremely sensitive to the number of crystal-like neighbors necessary to consider an ion part of the crystal nucleus.For example, Figure 5 demonstrates how the predicted rates change by around 10 À 15 orders of magnitude when crystalline ions are identified as those with a coordination number ≥ 4 or ≥ 6. Lamas et al., instead, resort to developing a systematic approach for the classification of the ions as part of a crystalline particle based on the analysis of the overlap of distributions of the local q 4 order parameter for a bulk crystalline phase and a bulk solution.The identification of the optimal threshold yields nucleus size estimates that give rise to nucleation rates in good agreement with FFS results.

| Overcoming solution depletion
Karmakar et al., 104 compute the nucleation barrier at two distinct supersaturation conditions from unseeded simulations (see Section 4.2 and Figure 4a) by combining WTmetaD 155 with CμMD. 102This approach allows a decoupling of the size of the nucleus from the chemical potential of the parent phase by mimicking an open boundary system at constant composition, therefore overcoming the depletion issues that typically affect both the qualitative and quantitative behavior of nucleating systems in small volumes (see Section 3.2). 33,92Depletion artifacts affect the shape of the nucleation free energy profile, the estimate of the critical nucleus size, and in severe cases, can even completely inhibit nucleation.Depletion effects are also present in regular seeding simulations.However, depletion only affects the estimate of nucleation rates when the critical nucleus size n Ã < < V cÀ c sat ð Þ, 89 where c sat is the concentration of a saturated solution.For systems where c sat is large, satisfying this equation may require very large simulation volumes.
Karmakar et al. use the nucleation barriers estimated via WTmetaD and CμMD to compute the exponential term in the nucleation rate expression (Equation 4).The prefactor is then estimated with the approach adopted by Espinosa et al. 118 The values of the nucleation rates obtained by Karmakar et al. following this route are reported in Figure 5.It should be noted that while the forcefield used in this study is the JC/SPC/E model, the results are not directly comparable as calculations were performed at 350 K, rather than at room temperature.

| NaCl nucleation mechanism is supersaturation-dependent
As mentioned in the introductory section, an important feature of nucleation from solutions is the fact that both the rates and the mechanism (or pathway) for nucleation depends on the composition of the mother phase 93 and, in particular, its degree of supersaturation.Studies based on seeding cannot lead to the discovery of pathways departing from CNT-compliant ones by construction.In contrast, unseeded simulations can reveal pathways that depart from CNT predictions (see Section 2).
Using unseeded simulations, Panagiotopoulos and co-workers 152 discovered the mechanism of NaCl nucleation from aqueous solution depends on supersaturation.By employing large-scale MD simulations and free energy calculations, they identified the limit of stability for aqueous NaCl solutions with respect to a liquid/amorphous phase separation (reported in Figure 5 as a vertical dashed line).The dense amorphous salt clusters observed beyond the limit of solution stability act as intermediates in the crystallization of NaCl, where crystalline order emerges within these disordered clusters following a two-step nucleation pathway 152 (see Section 2).Nucleation rates in this region of the phase diagram were obtained by Jiang et al. by estimating Mean First Passage times (MFPTs) from brute force sampling of nucleating trajectories and by performing umbrella sampling simulations to estimate nucleation barriers.Lamas et al. 115 corroborate these values of nucleation rates in the proximity of the limit of solution stability by constructing survival probability distributions from brute force simulations that yield nucleation rates within the same order of magnitude (see Figure 5).An overview of both the MFPT and survival probability methods for the calculation of nucleation rates from brute force simulation is provided by Chkonia et al. 204 The fact that at the limit of solution stability, the nucleation of NaCl follows a nonclassical, multi-step pathway has been further corroborated by the work of Bulutoglu et al., 205 where the nucleation mechanism of NaCl is analyzed by performing free energy calculations and by developing a theoretical approach based on the composite-cluster model 59 able to interpret the atomistic simulation results.Most notably, fitting the composite-cluster model to simulation data revealed that beyond the limit of solution stability, the amorphous salt clusters are thermodynamically favored compared to the aqueous solution, thus further validating the existence of a two-step nucleation pathway for NaCl at high supersaturation.

| Discovering nucleation mechanisms by combining biased and unbiased simulations
Motivated by the observation of dense liquid-like ionic clusters emerging in the double layer in simulations of a solidliquid interface 206 at supersaturation levels significantly lower than the limit of solution stability discovered by Jiang et al., 152 and by recent experimental and simulation observations suggesting significant ion-ion correlations occurring in supersaturated NaCl 121 ; we have recently investigated the emergent nucleation mechanism for NaCl(s) in supersaturated solutions below the limit of solution stability.To this aim, we performed multiple metadynamics simulations where we enhanced fluctuations in the local ion density to explore the configuration space of the nuclei forming in an aqueous solution.By mapping the nuclei configurations as a function of two order parameters indicative of the size of a dense cluster and of the size of a dense cluster with a crystalline structure, we collected configurations able to describe the emergence of crystalline NaCl nuclei without prescribing a specific pathway. 49y initializing hundreds of brute force MD simulations from uniformly distributed configurations in the n, n q 6 ð Þ space (q 6 being a sixth-order Steinhardt bond-order parameter), we then constructed a MSM able to yield model-free estimates of the nucleation rate of the committor surface in order parameter space.The MSM reveals that when S ¼ 3:7, both one and two-step nucleation mechanisms are indeed accessible, with the two-step nucleation pathways being slightly more favorable.Interestingly, the analysis of the committor probability surface in the n, n q 6 ð Þ space suggests that at these conditions, one may need to extend the attribute critical to an ensemble of clusters that, despite displaying a broad range of structures, include sizeable disordered domains and have an equal probability of evolving toward a macroscopic crystal or dissolving.
As well as characterizing the nucleation mechanisms far into the metastable solution zone, we also ruled out the PNC pathway for NaCl.Using umbrella sampling, we computed the equilibrium constant K for ion pair association in the dilute limit.Despite the significant ion association that occurs in solutions across all of the metastable solutions investigated S ¼ 1 À 4, the result that K < 1 means that ion dissociation is thermodynamically favorable in dilute solutions and ion assembly into liquid-like clusters is due to non-idealities in the solution phase.These liquid-like entities can reach significant sizes, containing up to hundreds of ions at the high end of concentration, and evolve their topology rapidly over simulation timescales.

| Uncovering multi-step processes involving crystal polymorphs
Using simulations as computational experiments, involves methods that enable the discovery of nucleation mechanisms as emergent, collective evolution of systems.The nucleation of NaCl from aqueous solution offers an interesting case study in this regard, showcasing the potential of simulation approaches while contextually providing a cautionary tale about the quality of the models used to explore nucleation. 163Performing WTMetaD simulations of NaCl nucleation from an aqueous solution, where the NaCl ions were modeled with the GROMOS forcefield, 207 Giberti et al. discovered that small NaCl clusters might preferentially adopt structures that differ from that of bulk rock salt. 163By employing a CV designed to enhance local density fluctuations without favoring a specific crystal structure, 46,163 the authors discovered that for the model adopted there is a competition between hydrated NaCl, rocksalt nuclei, and nuclei of a new wurtzite-like phase.An analysis of the CNT nucleation free energies of the rocksalt and wurtzite phases revealed that indeed, according to the molecular model, wurtzite-like arrangements are more favorable than rocksalt at small sizes.While this result is not representative of the real NaCl nucleation mechanism, due to the fact that the GROMOS model for NaCl strongly underestimates its aqueous solubility 169 and overestimates the stability of the wurtzite structure at the supersaturation conditions sampled by Giberti, it nevertheless provides a very important observation pertaining to simulation methods.This study, in fact, demonstrates that unexpected pathways can be discovered from a direct sampling obtained with enhanced MD techniques.

| PERSPECTIVES AND CONCLUSIONS
Molecule and particle based simulation methods provide useful tools to gain mechanistic insight into the nucleation of crystals from solutions.With proper sampling, quantitative thermodynamics and kinetics can be obtained to compare with experiments and test theories for crystal nucleation.In this review, we have highlighted state-of-the-art techniques to gain such information.These can be broadly categorized as methods which rely on theory with a well-defined reaction coordinate-mainly CNT-and those which are truly explorative.
Simulating nucleation in multicomponent solutions has been facilitated by the advent of enhanced sampling techniques and novel approaches to simulate nucleation using standard MD, for example, seeding.Exemplary studies of NaCl crystal nucleation, which we have discussed, highlight the range of methods available to overcome time-and length-scale challenges associated with the direct simulation of crystal nucleation in microscopic, closed systems.Alongside the referenced simulations to study a wide range of nucleating systems, these works demonstrate the capability of molecular simulations to predict crystallization outcomes and determine rates.
Figure 5 presents NaCl crystal nucleation rates evaluated from simulations and measured in experiments.Accepting that the y-axis spans 45 orders of magnitude, and given the known intricacies associated with nucleation rate calculations, 208 there appears to be reasonable consensus in the rates from unbiased and biased simulation strategies.System size effects (including the large critical nucleus sizes that are observed at low supersaturation, as well as solute depletion) mean that simulation studies typically adopt high values of S. In experiments, however, NaCl nucleation becomes so fast at the higher end of S that special experimental techniques must be used to determine rates even up to S ≈ 2. In the small range of S where simulations overlap with experiments, a comparison of the rates leaves much to be desired.While it is clear from Zimmerman et al. 116 that the vast difference in the rates from seeding simulations depends on the definition of the order parameter (OP), this issue is addressed by the mislabeling analysis of Lamas et al. 115 Taking the latter seeding results along with the forward flux sampling (FFS) rates from Jiang et al., 169 the force field predicts crystal nucleation rates that are approximately 20 orders of magnitude lower than experiments.This discrepancy may indicate inaccuracies in the molecular model; however, we note that there is even a discrepancy in the experimental data, spanning around five orders of magnitude.This example, simple system typifies the challenge of comparing simulation and experimental studies of crystal nucleation.The moonshot challenge, therefore, is to develop simulation and experimental strategies that consistently reproduce rates that can be confidently compared to one another.Mechanistically, the outlook is less bleak, as both simulations and experiments indicate one-and two-step crystal nucleation according to the solution supersaturation. 49,121,152,205mpressive advances in simulation capabilities were made over the previous decade; however, it is undoubtedly evident to the reader that the studies we highlight where quantitative thermodynamics and kinetics were evaluated apply to very simple systems.Even so, the determination of nucleation rates requires exhaustive computational resources.Sampling crystal nucleation pathways in more complex systems, such as in solutions of molecules with conformational freedom that are so important to the pharmaceutical industry, is still a major challenge that is yet to be realized beyond small molecules.Nevertheless, increasing computational power and the progress made in machine learning techniques may make the routine simulation of crystal nucleation realizable in the near future.
Assuming that computational resources are abundant, we doubt that a one-size-fits-all simulation approach will be achieved any time soon.This is principally due to the many different crystallization pathways that are evident in different systems.Indeed, simulations are utilized to support perspectives regarding the invalidity of established theories to describe nucleation generally.Adopting the most suitable simulation strategy can therefore be of critical importance.When departing from CNT-compliant methods, it is far less obvious what reaction coordinate should be sampled.As we have discussed, even in the CNT framework, how the cluster size is determined can drastically affect the predicted nucleation rates.While methods based on transition path sampling are less susceptible to errors in rates due to inappropriately defined reaction coordinates, a rigorous mechanistic understanding requires appropriate variables to describe crystal nucleation and countless examples are found in the literature.Unfortunately, there is no silver bullet when it comes to quantifying the emergence of order, though perhaps, this is good for explorative purposes.
Simulating crystal nucleation in extremely large atomic systems is unlikely to be achieved in the near future.For instance, some systems, such as proteins, are just too big; others have such low solubility that simulating crystallization directly from monomers in solution at experimental concentrations requires system's sizes that are simply beyond our reach (hence, the necessity to develop increasingly accurate coarse-grained representations of these systems).However, with continued advances in simulation techniques and an increased understanding of the factors that govern nucleation from solution, we believe that the next step will be to systematically extend nucleation studies to systems where the growth units are conformationally flexible organic molecules and/or are polymorphic.In turn, this will enable accounting for crucial out-of-equilibrium effects in computationally-assisted material design.

F
I G U R E 4 MetaD simulations of crystal nucleation from solutions.(a) Combining CμMD 104,110 and WTMetaD Karmakar et al. compute the free energy surface for NaCl nucleation from solution, limiting the finite-size effects due to confinement.Here the CV enhanced the sampling of crystal structures, and the method was focused on the calculation of the free energy barrier to nucleation.Adapted with permission from Ref., 104 copyright 2019 Americal Chemical Society.(b) Nucleation of urea from solution.Different solvents induce different nucleation mechanisms: while MeOH and EtOH promote a single-step classical-like process, water and ACN lead to a two-step process.Reproduced from Ref. 93 with permission from the Royal Society of Chemistry.(c) Metadynamics enables the discovery of a two-step nucleation mechanism for the synthesis of Methylammonium Lead Iodide Perovskites from solution.Reproduced with permission from Ref., 165 copyright 2020 American Chemical Society.
this estimate is based on the values independently estimated by Moucka et al. (3.64 AE 0.2 m), Mester et al. (3.71 AE 0.04 m), Benavides et al. (3.71 AE 0.20 m), Kolafa et al. (3.6 AE 0.4 m) and Espinosa et al. (3.7 AE 0.4 m), and is the product of a process that has seen the refinement of simulation approaches and correction of earlier inaccurate estimates.

F I G U R E 5
Rates for NaCl crystal nucleation from aqueous solution taken from the literature.Here, BF refers to brute force MD simulations in which crystal nuclei emerge spontaneously.The limit of solution stability is indicated by the vertical dashed line at S ¼ 4:05.The rate for Bulutoglu was evaluated using their literature rate value of 4 Â 10 3 s À1 and with an ionic density for NaCl(aq) of 6.5 nm À3 at 15 mol/kg.Crosses indicate experimental literature values taken from reference 115.All simulations are performed at 298 K, except in the case of Karmakar et al.,104 where simulations were performed at 350 K.The data reported in this plot are available at https://github.com/mme-ucl/NaCl_water_Nucleation_Rates.git.