Machine‐learning‐based interatomic potentials for advanced manufacturing

This paper summarizes the progress of machine‐learning‐based interatomic potentials and their applications in advanced manufacturing. Interatomic potential is essential for classical molecular dynamics. The advancements made in machine learning (ML) have enabled the development of fast interatomic potential with ab initio accuracy. The accelerated atomic simulation can greatly transform the design principle of manufacturing technology. The most widely used supervised and unsupervised ML methods are summarized and compared. Then, the emerging interatomic models based on ML are discussed: Gaussian approximation potential, spectral neighbor analysis potential, deep potential molecular dynamics, SCHNET, hierarchically interacting particle neural network, and fast learning of atomistic rare events.


| INTRODUCTION
Advanced manufacturing using new materials and emerging technologies encompasses all the aspects of industrial production. In many fields, such as microelectronics and nanoelectronics, the cost of developing and producing the tools for manufacture has become a major limitation for the whole system. Therefore, it has become increasingly important to include simulation early in the design process. Atomic simulation techniques such as molecular dynamics (MD) and density functional theory Machine-learned spectral neighbor analysis potential (SNAP) models can well establish the high-performing embedded atom method (EAM) and modified EAM potentials for face-centered cubic (fcc) Cu and Ni. 1 ReaxFF parameterization is an efficient artificial neural network (ANN)-based method to simulate the reactive dynamics of the zinc-oxide model. 2 An Intelligent ReaxFF (I-ReaxFF) model in terms of matrix (or tensor) operations based on the TensorFlow coding platform can optimize ReaxFF parameters automatically. 3 More importantly, ML potentially establishes a direct relationship between the atomic structure and the system energy. It uses electronic structure information to model the MD system and does not make any physical approximations to the functional form.
The machine-learning-based interatomic potential has already been developed and is widely used in various fields of materials research. This review is divided into the following sections. The most widely used ML methods are summarized in Section 2. A few emerging methods to generate interatomic potentials are introduced in Section 3. A few applications in machine-learning-based interatomic potentials for advanced manufacturing are compared in Section 4.

| ML METHODS
Recently, artificial intelligence (AI) has been attracting worldwide attention, benefiting from the development of computer science and data processing technology. As the most significant part of AI, rapid developments have been achieved in the ML field in the last 40 years. 4,5 ML shows outstanding performance in many fields such as pattern recognition, 6 image segmentation, 7 medical diagnosis, 8 catalytic reactions, and autonomous driving cars 9 by reorganizing existing knowledge structures and mining implicit relationships. For example, the possible synthetic route of chemicals based on the results of deep neural networks has a similar feasibility to those formulated by human experts. 10 ML can even collect valuable data from failed experiments. 11 A typical workflow to apply the ML commonly begins with unsolved problems, then acquisition of sufficient data samples, extraction of appropriate features, and finally, establishment of the ML model. 12 The ML model can be adopted in the prediction problem and even mechanism analysis. Several ML algorithms have already been proposed and developed. Considering the research target and the sample size, parts of the ML algorithms are more suitable than others in different research topics.
Several popular ML algorithms are briefly introduced herein, and their suitability is discussed in the following paragraphs.
Linear regression is a basic strong algorithm commonly used to illustrate the relations and solve the problem for predicting target values in the corresponding research. 13 Linear regression algorithms usually consist of least absolute shrinkage and selection operator regression (LASSO), 14 elastic net regression (ENR), 15 and so forth. The basis of the linear regression algorithm is the ordinary least squares (OLS) method as shown in Figure 1A. The aim of the OLS method is to minimize the sum of squares of the differences in the ML-predicted and experimental values (L) by adjusting β as shown in the following equations: To avoid the challenge of overfitting for OLS owing to the features of multicollinearity and weak relation, regularization linear algorithms have been exploited. LASSO and ridge regularization are the most widespread regularized linear models that are established by L1 and L2 regularization methods, respectively. In addition, the ENR algorithm is combined with the above two regularization algorithms and has the advantages of both linear algorithms. The linear model is commonly used to establish models at first because it is useful to gain beneficial insights into the feature relations. Generally, linear models can reveal the linear feature interactions, but may perform poorly when the features have multiple collinearities. 13 Kernel regression algorithms, as shown in Figure 1B, can investigate the nonlinear correlation of a data set by combining the kernel scheme with the nonlinear kernel. 17 Support vector regression (SVR), 18 Gaussian process regression (GPR), 19 and kernel ridge regression (KRR) 20 are the most widespread kernel algorithms. Kernel regression can map the data points to a higher-dimensional feature space at first by calculating k*(x · x) (k represents the kernel function). 21 Then, the mapping relations can be applied to replace the β in Equation (1). KRR is a typical ridge regularization linear algorithm combined with the kernel method. SVR is similar to KRR, but its loss function is completely different. The kernel regression models are excellent in terms of the feature of multicollinearity, but it is difficult to choose a suitable kernel, and it is generally not universal. 22 Decision tree regression is also a common used supervised algorithm in the ML field. 23 Decision tree regression usually consists of k-neighbor regression (KNR), 24 random forest regression (RFR), 25 extra trees regression (ETR), 26 and so forth. As shown in Figure 1C, a decision tree can construct a continuous decision boundary by deciding numerous "if-else" problems as the branches of the tree, until hitting a terminal leaf node.
Then, those decisions divide the entire feature space into different regions and obtain the decision threshold by the loss function. 27 28 Besides, the ensemble method, which combines multiple basic ML models to establish a single optimal model, is also popular in the engineering field. Gradient boosting regression (GBR) is the most well-known ensemble method. 29,30 Decision tree algorithms such as random forest and extra trees can also improve the ML model by selecting the average prediction values from a single classifier.
The neural networks (NNs) algorithm is a rising star in the field of ML. It is powerful as it is inspired by the behavior of the brain. 31 Feedforward neural network (FNN) is one of the strongest NN models, which is commonly used. 32 As Figure 1D shows, the training data are transferred to the nodes (perceptron), which are linked to each other by edges with different defined weight parameters and thresholds.
Each individual node is its own linear regression model, including input data, weights, threshold, and an output. After the input layer is determined, the weights are also determined. The weights can help evaluate the significance to the output compared to other inputs. The nodes gather together and can be divided into different separate hidden layers, in which weighted features are summed, and then the sum is transmitted into a nonlinear activation function (such as Relu and Sigmoid) to optimize the model. 33 If the output of the individual node is larger than the threshold value, the node will be activated and the data will be sent to the next layer. Otherwise, no data are passed along. The last layer called the output layer can output target values depending on the practical problem. NNs models are usually trained through a cyclic process called back-propagation, which is a repeatable optimization process of the weights of hidden layers. NNs algorithms have the characteristics of high applicability and excellent performance, but the weight parameters are also difficult to understand, and there are few training samples, which are not suitable for neural network models. 31 Table 1 briefly presents the category of those aforementioned algorithms. The appropriate data set size of each model is also listed. For few-shot learning (10-200), linear algorithms and kernel regression models are the most suitable, while for medium size (e.g., 100-500), the decision trees models are the best. Considering that the performance of NNs models improves with increasing data set size, NNs models are especially appropriate for those large data sets (>500).

| EMERGING METHODS TO GENERATE INTERATOMIC POTENTIAL
ML interatomic potentials have shown comparable accuracy to ab initio MD methods, while the length and time scales of the computation have been markedly improved by orders of magnitude in MD simulations. 34 In this section, we will briefly introduce some commonly used methods to generate ML interatomic potentials, including NN-and kernel-based regression models. 35 F I G U R E 1 Sketch map of (A) the linear regression algorithms, (B) kernel regression algorithms, (C) decision tree algorithms, and (D) neural networks algorithms in the machine learning framework, respectively. Reproduced under terms of the CC-BY license. 16 Copyright 2021, The Authors, published by AIP

| Gaussian approximation potential (GAP)
GAP 36 is a kernel-based method that can generate interatomic potential automatically from the data of the atomic configurations and the corresponding energies and forces, which are calculated using DFT-based methods. 37 In this method, a bispectrum space is built to represent the atomic neighborhood by extending the 4D sphere using the specified coefficients. The bispectrum is given by 36 where C denotes the ordinary Clebsch-Gordan coefficients. 38 The local atomic environment with j j j J , , ≤ 1 2 max can be described by the truncated version of the bispectrum. Then, nonparametric Gaussian process (GP) regression 39,40 is used to predict the atomic energy by 36 where n denotes the reference configurations, l the bispectrum components, and θ l are the hyperparameters. The local atomic energy function is generated by a kernel function, which measures the similarity between the local environment and directly determines the success of the generated GAP. The configurations from two adjacent atoms or one atom during the two sequential MD steps are very similar, leading to strong correlation of the data set. The sparsification procedure is adopted to solve the problem by randomly sampling the configurations from the data set.
A widely used kernel function to train the GAP is the smooth overlap of atomic positions (SOAP) kernel, 41 which is computationally efficient. The computation cost can be greatly reduced by about one order of magnitude, with the prediction accuracy unchanged. The SOAP-GAP 42 model can also be applied to predict the molecular properties.

| Spectral neighbor analysis potential
The local environments are represented by the bispectrum components in SNAP, 43 which is the same as GAP. However, unlike GAP, the potential energy and bispectrum components are assumed to be linearly dependent in SNAP. This assumption makes it possible to use the linear functions of the SNAP coefficients to describe the total energy, atom forces, and stress tensor. The total energy can be written as 43 where β 0 is a constant, indicating the element-specific contribution for each atom i; β and B i are both the k-dimensional vectors of SNAP coefficients and bispectrum components, respectively.
The specific values of the hyperparameters chosen in SNAP directly determine the performance of the SNAP potential. As shown in Figure  The errors between energy predicted by SNAP and calculated by LAMMPS are then aggregated into the loss function to optimize the hyperparameters of SNAP by DAKOTA.

| Deep potential molecular dynamics (DPMD)
DPMD 45 is an NN-based interatomic potential method that models the relationship between a sum of atomic energy E E = ∑ i i and the environment of all atoms by deep neural networks (DNN). 46 The energy E i of atom i is determined by the local environment within a cutoff radius R c around the atom. This can meet the requirement to preserve translation, rotation, and permutation invariance, as shown in Figure 3.
The coordinates R i of atoms i inside R c are fed into a fully connected network to generate the representation D ij of the local environment of atom i. The R 1/ ij factor is introduced to the local environment D ij , meaning that the weight is reduced linearly with an increase in distance, as the input of a feed-forward DNN model. D ij is fed into the DNN from the input layer, and then flows to multiple hidden layers through a linear transformation. Following a nonlinear activation function, 47 where p p p , , ε f ξ are adjustable coefficients, Δ denotes the error between the DPMD prediction and the training data, N is the atomic numbers, F i denotes the force on the atom i, and ξ denotes the virial tensor.
The data-generated first-principles calculations will be fed into

| SchNet
The deep learning architecture SchNet can model complex atomic interactions to predict the potential energies of the atomistic systems. 49 SchNet is designed to learn the representations for the local atomic environment that satisfy the invariant under all physical symmetries.
As shown in Figure 4, the network structure of SchNet is similar to deep tensor neural networks (DTNNs), 50  . The atom-wise layer is a fully connected layer. In interaction layers, SchNet applies continuous-filter convolutions with filter-generating networks to model the interaction term as 49 where "∘" denotes the element-wise multiplication. 49 This operation can reduce the dimension of the feature set to improve the computational efficiency. 51 The Adam optimizer is applied to train SchNet for the target energy E and force F by minimizing a combined loss function 49 where ρ is a tunable coefficient, which trades off between energy and force. The weights and bias of the SchNet model are trained using the mini-batch stochastic gradient descent method.

| Hierarchically interacting particle neural network (HIP-NN)
HIP-NN 52 is an NN-based method to model the potential energy from the ab initio quantum mechanical calculation. The total energy E is a sum of local contributions of each atom, E E = ∑ i i . Moreover, the potential energy is also decomposed as a sum over n hierarchical terms generated from a corresponding NN, 52 where W ab l , M ab l , and B a l are all learned parameters.
The green boxes denote interaction layers, in which the spatial sensitivities are selected to collect information from the atoms in the cutoff radius 52 In blue boxes, linear regression is adopted to model the re- to preserve the atomic information. As shown in Figure 6A, the local energy E i only considers the contributions of atoms within a cutoff distance R n cut ( ) from atom i. Figure 6B shows The number of DFT calls will decrease with the above process.
After training on a sufficient amount of data, the DFT calculation is no longer needed. The trained GP model will be mapped to highly efficient tabulated force fields.
Most of the machine-learning-based interatomic potential approaches consist of three main ingredients. The first includes training data, which provide the structure information (atom coordinates, atom types, atom numbers, etc.) and the reference potential energies. The reference potential is generated from first principles calculations, using quantum electronic structure software packages (VAPS, Quantum ESPRESSO, CP2K, LAMMPS, etc.). The second is a representation of the atomic structure, which can maintain the translation, rotation, and permutation invariance. The third is a regression algorithm, which is used to determine the relationship between the representation and the potential energy.
The regression model will output the potential energy, once the input structures in the training data are fed into the regression model.
The loss function will be designed based on the difference between the output potential energy and the training data. Then, the interatomic potentials will be obtained by optimizing the hyperparameters of the regression model using optimizers, such as gradient descent (GD), Nesterov accelerated gradient (NAG), AdaGrad, Adam, and so forth.

| APPLICATIONS
The applications of ML-generated interatomic potentials in selected problems in materials science have been reported: alloys, phase change materials (PCMs), and thermal performance calculations.  Figure 7). The generation processing of the interatomic potential of the bulk and the multilayer structure of hexagonal boron nitride use the GAP ML algorithm. 64 An ML data set construction method in which the data set is regularly retrained is used on elemental aluminum (ANI-Al). 65 The physically informed neural network (PINN) method is used to obtain the potential function for Al and accurately predicts some physical properties. 66 The grain boundary energies of fcc element metals (e.g., Al, Cu, Ag, Au, Pd, and Pt) are predicted by ML potential. 67 The semi-empirical force field potential for titanium is trained by an ANN. 68 A set of ML interatomic potentials for bcc structure transition metals Mo, Nb, Ta, V, and W using the GAP framework is used to study melting and liquid properties. 69 Several ML methods suitable for high-dimensional systems used in the construction of PESs are also summarized. 70

| Alloy
Alloying is one of the most commonly used methods to tune the properties of materials. The complex local environment of alloys hinders the traditional fitting generation of interatomic potential. For example, the fabrication of a high-entropy alloy, usually with at least 5 different kinds of metals, requires a much more complicated interatomic potential. The 5 alloys require at least 10 combinations of pair potentials, which is far beyond the capacity of the fitting.
Therefore, the application of ML for alloys was one of the first in the field. Several groups have proposed different methods for various high-entropy alloy systems.
It has been reported that linearized pairwise and angulardependent MLIPs are robustness for 31 elemental metals, and angular-dependent descriptors are important for transition metals. 71 The ML framework is applied to predict the segregation energy of more than 250 metal-based binary alloys. 72 Deep potential generators (DP-GENs) can produce potential energy surface (PES) models in Mg, Al, and Mg-Al alloys. 73 The potential generated by the NN-ML approach of the Pd-Si system can describe both liquid and crystal structures. 74 The Gaussian approximate potential framework is used to train the potential of the W x Mo 1-x random alloys. 75 A multi-fidelity (MF) ML framework leveraging GP calculated the bulk modulus of an aluminum-niobium-titanium (Al-Nb-Ti) ternary alloy composition space. 76 The moment tensor potentials (MTPs) are used to predict the high-temperature elastic properties of GUM Ti-based alloys. 77 Some bulk, interface, and defect characteristics are calculated in the range from low temperatures to those close to the melting point of Ti-based alloys. 78 The ML framework is also applicable for high-entropy alloy systems. The short-range sequencing of an equiatomic Co-Cr-Fe-Ni high-entropy alloy is calculated, 79 as shown in Figure 8. For multiprincipal element alloys (MPEAs) and high-entropy alloys (HEAs), the calculation results of local lattice distortion affect the elastic properties. 82 The phase stability of a bcc Al-Nb-TiV refractory highentropy alloys is also studied. 83

| Phase change materials
PCMs are extensively used in thermal management and dynamics, especially microelectronic storage devices such as optical disc and non-volatile memory cells. Although some transition-metal oxides such as VO 2 are widely used for thermal/optical applications, most PCMs are chalcogenide with three to four elements and amorphous structures, which is far beyond the calculation capacity of DFT.
Unlike alloying, where the metallic bonding can be treated as a freeelectron approximation and the crystal structure is simple and clear, the covalent bonding in chalcogenides is orientation dependent and can form a complex network. Therefore, a more precise interatomic potential is necessary.
The applications and potential of use of ML technology to determine with the functional characteristics of PCMs are summarized in this study. 84 The phase transition behavior between incommensurate host-guest structures (KIII) and close-packed fcc of alkali metal potassium is reported to occur in a complex and diffusive F I G U R E 7 Scheme of ML interatomic potentials actively learning on-the-fly. An active learning approach determines the degree of extrapolation for each sampling configuration. If the extrapolation grade is sufficiently high, then the configuration is learned; otherwise, the energy, forces, and stresses are calculated using the Moment Tensor Potentials (MTPs) model. Reproduced with permission. 63 Copyright 2019, American Physical Society manner. 85 The electronic structures of the medium gap state of the prototypical phase-change alloy, Ge 2 Sb 2 Te 5 , were identified and characterized by the ML potential, 86 as shown in Figure 9. The phase transition and plastic properties for iron were researched by a classical potential trained by the GAP framework. 87

| Amorphous material
An ML model constructed using the GAP methodology is obtained for amorphous carbon. 88 Deformation of the representative volume element of amorphous carbon is achieved by melt quenching using bond order and ML of the interaction potential between atoms. 89 The shown in Figure 10. The local structural order in Al 90 Tb 10 metallic glass is studied using an interatomic potential trained by a deep ANN. 96 The effectiveness of the potential for ZrB 2 ultra-hightemperature ceramic (UHTC) materials is extended from room temperature to the ultra-high-temperature region. 97

| Heat transfer
Heat transfer is one of the most important processes for ad- proposed. 106 The potential from the Behler-Parrinello NN framework is used to predict metallurgical critical dislocations and cracks for pure hcp magnesium. 107

| Damage
Damage is the key process governing a material's mechanical properties, such as crack and creep, and requires a large space and time scale for simulation. The ML interatomic potential of F I G U R E 11 An ML interatomic potential for tungsten using the GAP framework is used to calculate the binding energies of a divacancy in bcc W at different nearest-neighbor separations, which compares with DFT from Refs. , our own DFT, and EAM potentials. Reproduced with permission 109,110 Copyright 2020, American Physical Society. DFT, density functional theory; EAM, embedded atom method; GAP, Gaussian approximation potential tungsten is calculated by the GAP framework, which enables simulation of radiation damage in tungsten, 109,110 as shown in Figure 11. Similarly, the evolution of silicon primary damage and collision cascades are also analyzed. 113 Neutron bombardment is studied in a primary knock-on atom (PKA) of crystalline molybdenum material samples by classical MD using a potential based on the GAP framework. 114 A tungsten-beryllium potential is developed by SNAP and used to simulate highenergy Be atom implantation onto the (001) surface of solid tungsten. 115 Extensive atomic simulations were performed on the deposition and growth of amorphous carbon (a-C) thin films using a GAP model to describe the interaction between atoms. 116 Neural-network potentials (NNPs) are applied to simulation early-stage nucleation and growth of the Al-Cu system. 117 A charge-transfer ionic potential (CTIP) model for the Cu/Hf/O alloy system can accurately predict the surface properties of both oxides and metals. 118 The GAP of silicon can be calculated for dynamic brittle fracture, 119 as shown in Figure 12. ML potential applies to the formation energy of the interface phase. 120

| CONCLUSIONS
The use of ML in the generation of interatomic potential can considerably increase the efficiency of MD simulation and drastically transform advanced manufacturing by incorporating atomic simulation early into the design process. We have dis-