Quantifying Chemical Structure and Machine‐Learned Atomic Energies in Amorphous and Liquid Silicon

Abstract Amorphous materials are being described by increasingly powerful computer simulations, but new approaches are still needed to fully understand their intricate atomic structures. Here, we show how machine‐learning‐based techniques can give new, quantitative chemical insight into the atomic‐scale structure of amorphous silicon (a‐Si). We combine a quantitative description of the nearest‐ and next‐nearest‐neighbor structure with a quantitative description of local stability. The analysis is applied to an ensemble of a‐Si networks in which we tailor the degree of ordering by varying the quench rates down to 1010 K s−1. Our approach associates coordination defects in a‐Si with distinct stability regions and it has also been applied to liquid Si, where it traces a clear‐cut transition in local energies during vitrification. The method is straightforward and inexpensive to apply, and therefore expected to have more general significance for developing a quantitative understanding of liquid and amorphous states of matter.

The structure of amorphous silicon (a-Si)i sw idely approximated as ac ontinuous random network with tetrahedral coordination, [1] but its details are much more intricate: defective environments,s uch as threefold-bonded "dangling bonds", as well as the degree of medium-range order,h ave been discussed. [2] Together with experimental probes, [3] atomistic computer simulations have been giving useful insight into a-Si for decades, [4] and large-scale simulation models now contain up to hundreds of thousands of atoms. [5] With the recent emergence of linear-scaling machine-learning(ML)based interatomic potentials reaching accuracy levels close to quantum mechanics, [6] materials modeling is promising to become even more realistic-especially in describing amorphous solids, [7] as recently shown for a-Si. [8] Still, there remains the more fundamental challenge of not only to describe amorphous structures but to truly understand them. Simple criteria are widely used, including atomic coordination numbers (here denoted as N)a nd bond angles, which both give information about short-range order (SRO), [9] or ring statistics as ar epresentative for mediumrange order (MRO). [10] However,w ed on ot know of ap revious simple and general numerical approach that can quantify SRO and MRO at the same time.A nd even more critically,t hese purely structurally-based indicators cannot give information about the energetic stability of individual environments.
Here,w ed escribe ag eneral, ML-based approach that quantifies local structures and local energies of all individual atoms in models of a-Si. We first introduce as tructural coordinate that unifies the description of SROa nd MRO environments and then combine this structural information with asecond, stability coordinate in atwo-dimensional plot. Both analyses rely on the "learning" of local structure, manifested in am athematically well-defined framework without parametric terms.T he ability to "machine-learn" local chemical knowledge is an emerging research theme throughout the discipline:M L-predicted atomic energies have been used to understand the stability and chemical nature of molecules [11] and crystal structures, [12] and to accelerate structural optimization. [13] Here,w et ransfer such analyses to the amorphous and liquid states,where there is an even more dire need for information about atomically resolved stabilities and properties.
Our object of study is an ensemble of a-Si networks that we created in parallel ML-driven molecular-dynamics (MD) simulations:5 12-atom models of liquid Si were cooled to solidify into a-Si (Figure 1a). [8] Slower cooling yields more ordered networks; [8] hence,c hanging the cooling rate allows us to tailor the degree of order in the structures and to probe its influence on the properties.Remarkably,the most ordered structures we obtained (for quench rates of 10 11 and 10 10 Ks À1 ), albeit still containing % 1% defects,a re energetically more favorable by 0.02 eV/at. (at the DFT-PBE level) than af ully tetrahedral-like relaxed Wooten-Winer-Weaire (WWW) model, [1] which is currently considered ag oldstandard model for a-Si (see Supporting Information).
We start by illustrating the current state of the art. An established indicator for SRO in a-Si measures how similar the atomic environments are to ideal tetrahedra, as probed via the bond angles. [9] This order parameter increases as expected with slower quenching (increasing ordering) and then converges in aw ay that the median results for our 10 11 and 10 10 Ks À1 structures are very similar (Figure 1b). We also look at MROv ia the count of six-membered rings (Figure 1c)-which is 100 %i nd iamond-type c-Si, where all atoms are in cyclohexane-like rings,but smaller in a-Si due to the presence of disorder.W es tress again that these are two disjoint measures:bond angles cannot quantify MROand the ring count does not give information about SRO.
To progress further, we now turn to the smooth overlap of atomic positions (SOAP) kernel, [14] amathematical approach that has been used with success to fit ML potentials [15] and to analyze structures. [16] In the SOAP formalism, the neighborhood density of the i-th atom, smoothed by Gaussian functions with width s at and truncated by ac utoff function f c ,isexpanded into an atom-centered basis of radial parts R n and spherical harmonics Y lm , [14] similar to how electronic wave functions are expanded in quantum chemistry.B ased on the resulting combination coefficients,asimilarity function or kernel can then be constructed, which provides aquantitative similarity measure on as cale from 0t o1.H owever,the absolute value depends on the chosen cutoff radius and on the Gaussians that are placed on the atomic neighbors.T oa nalyze both nearestneighbor (NN) and next-nearest neighbor (NNN) environments,w eh ere propose to combine two different SOAP kernels:t hat for the NN shell, making as harp distinction between environments,a nd that for the NNN shell, being more tolerant to small structural changes.T herefore,w e calibrate the fuzziness (via s at )u sing c-Si at T > 0K as ar eference by requiring that the NN and NNN range of the SOAP values are similar in an ordered network with only thermal fluctuations (shaded area in Figure 1d). We then apply the same pair of kernels to a-Si:t here,t he increasing order with decreasing quench rate,t he convergence of SRO, and the further increase of MROa re all correctly described within the same conceptual framework. Thesimplicity and power of this NN-NNN kernel pair can also be shown by applying it to crystalline allotropes of Si (Table 1). In the lonsdaleite(hexagonald iamond)-type form, the NN environments are as in the cubic diamond type (namely,i deal tetrahedra, with as imilarity of 1.000), but the NNN environments differ,d ue to the rings being in boat rather than chair conformations,and hence the similarity to c-Si drops to 0.974. We next look at an open-framework Si allotrope, oS24, which was synthesized from Na 4 Si 24 by sodium de-intercalation. [18] In oS24, the atoms are tetrahedrally coordinated, too,but with strong local distortions,and so the resulting NN values are comparable to those in a-Si; the NNN values drop further, because the open framework structure is remarkably different from c-Si. Finally,for b-tintype Si with 4+ +2-type coordination, the NN environments are Figure 1. Progressively ordered a-Si networks from melt-quench simulations with an ML-based interatomic potentialo fquantum-mechanical quality.a)Scale of cooling rates and associated required simulation times (1 ps requires 1000 MD time steps). Each tick corresponds to one independentM Dsimulation.Between 10 14 and 10 11 Ks À1 ,we cooled at the respective constant rate;f or the much more demanding 10 10 Ks À1 simulation,wevaried the rate during the run (see Supporting Information). Twosimulation cells are shown as examples and coordination defects are highlighted by coloring (green:over-coordinated "floating-bond" environments;b lue:u nder-coordinated "dangling-bond" environments). b) Increasing short-range order (SRO) in these systems, quantified using an established order parameter that returns unity for ideal tetrahedral environments. [9] c) Increasing medium-range order (MRO), assessed by counting 6-membered rings. [10] d) Unified description of both length scales using SOAP analysis. We first calibrated the SOAP kernel parameters (Table 1) for NNs (red) and NNNs (blue) using samples of thermalized c-Si and then applied the method to our a-Si networks. Median values over all atoms in the cells are given for each system. Error bars are shown for the SOAP values at 10 11 Ks À1 to estimate the scatteringoft he results; they indicate the threefold standard deviationf or five additional, independentruns (see Supporting Information).
clearly dissimilar to those in c-Si, and the NNNs even more so (Table 1).
We now consider the energies of the individual atoms, ac rucial piece of information that cannot easily be obtained from DFT computations,which yield the total energy for the entire cell. In contrast, atomic energies are directly included in many ML-based interatomic potentials by construction. [6,18] In the Gaussian approximation potential (GAP) framework we use here,t he total energy is as um of machine-learned atomic energies that generally read [18] where the sum runs over as et of reference environments in the training database (index j)a nd two environments are compared using the similarity (kernel) function K ij ,for which we use the SOAP formalism here.H ence,o ur structural analysis,G AP-MD,a nd local energies all build on the same mathematical framework. Figure 2a shows that, indeed, machine-learned atomic energies reveal the stability trends intuitively expected for a-Si, the interpretation being qualitative for now.I nt he structural fragment shown in Figure 2a,t he dangling-bond defect (red)h as ah igh local energy,w hereas the two tetrahedral-like central atoms (white,b lue) are more energetically favorable,d epending on how strongly they are distorted. Histograms of these data, collected for adisordered, rapidly quenched structure (Figure 2b)a nd am ore ordered, slowly-quenched structure (Figure 2c)r eveal that the energetic center of gravity for the more ordered a-Si network does coincide with the experimentally determined stability. [19] We , and amore favorable tetrahedral environment with only low distortion (blue). Atoms are color-coded according to their GAP atomic energy, e i ,g iven relative to that in diamond-type c-Si. b) Histogram of atomic energies in the structure quenched at 10 14 Ks À1 .T he experimental enthalpy of relaxed a-Si (from Ref. [19],a lso relative to c-Si)i sindicated by yellow shading. The Gaussian-process(GP)-predicted error is indicated in the inset, showing akernel-density estimate for all atoms in the structure (see SupportingInformation for details). c) Same analysis as in (b) for the more ordered 10 11 Ks À1 structure. d) 2D plot revealing the connection between structural order (NN similarity to diamond-like c-Si;horizontala xis) and GAP local energy for the individuala toms (vertical axis). Results are collected for all 14 systems, that is, for all quench rates from 10 14 to 10 10 Ks À1 (see Figure 1). Kernel-density estimates (smoothed histograms) are given for projections on both axes. e), f) Local electronic DOS for astructure quenched at 10 11 Ks À1 from Ref. [8], illustratingthe very different electronic fingerprints of three-and fivefold-bonded coordination defects. DOS plots are normalized per atom;for comparison,the average local DOS for all fourfold bonded atoms in the same structure is given by dashed lines. The red arrow in (e) highlights the mid-gap states associated with dangling-bond defects. Table 1: SOAP parameters [14] for the pair of kernels defined in this work and results for atomic sites in crystalline Si allotropes as obtained from both kernels.

NN kernel NNN kernel
Basis set size (n max , l max ) (  note that such analyses are,i np rinciple,p ossible with empirical interatomic potentials, [20] but can be limited by the parametric shape of the potential. In contrast, our approach depends only on the input data, is readily generalized, and combines accurate DFT input data with ahigh-fidelity ML fit whose uncertainty can be quantified [15d] to be in the meV range (insets of Figure 2b,c). A2 Dp lot combining both quantities,t he SOAP-based diamond-similarity and the local energy,i sp erhaps the most revealing form (Figure 2d). Distinct energy regions at around + 0.4 and + 0.6 eV above that of ideal c-Si are found for floating-bond (N = 5) and dangling-bond (N = 3) environments,respectively.The floating bonds show awide structural variation within the NN shell, indicated by alarge spread over the SOAP (x-axis) coordinate,whereas dangling bonds clearly peak around as imilarity value of 0.92. In this respect, the dangling-bond atoms are clearly structurally closer to c-Si than the 4+ +2-type-coordinated atoms in the b-tin allotrope ( Table 1), but they are considerably less diamond-like than the median result for any of our a-Si structures (Figure 1d), which are dominated by fourfold-bonded atoms.L ooking at Figure 2d again, it appears that the data points for dangling bonds (N = 3) lie in the tail of ac ontinuation of the plot for tetrahedral-like environments (N = 4);this is not the case for floating bonds (N = 5). Finally,w en ote that the energies for N = 4a toms reach up to high values:t heir median is + 0.14 eV,b ut the 98th percentile is at + 0.42 eV and thus the remaining 2% fourfold-bonded atoms have energies that are higher than the median result for N = 5d efects (+ 0.42 eV). This explains how our 10 10 Ks À1 structure, albeit having defects,c an be lower in energy than the defect-free WWW model. [1] Thehigher GAP atomic energy (that is,larger instability) of dangling bonds compared to floating bonds is not only in line with previous theories, [2a,20] but it can also be confirmed by the electronic densities of states (DOS). Forthe energetically unfavorable dangling bonds (N = 3), alarge peak at the Fermi level, within the band gap,i ss een from an atomresolved projection of the DOS (Figure 2e). In contrast, the energetically more favorable floating bonds (N = 5) make no pronounced mid-gap contributions to the DOS (Figure 2f).
This approach is expected to be general and to have wider significance.W es how,a sa ne xample,a ne xtension from the amorphous to the even more disordered liquid phase of silicon ( Figure 3) which has been widely studied by simulations [22] and experiments. [23] Remarkably,the liquid appears to consist of atoms with aw ell-defined normal distribution of GAP local energies,a round + 0.6 eV/at. above that of c-Si. This distribution stays almost unchanged all the way until an arrow temperature window of approximately 1175 to 1195 K (Figure 3). There is adistinct transition with abimodal distribution of atoms (center right panel in Figure 3) that our analysis suggests to be clearly either liquid-like or amorphous-like.Established empirical potentials [21] fail to capture the nature of this transition, presumably because they have not been fitted to include the diverse structures in the liquid state.Further work will deal with detailed mechanistic studies of liquid Si using our new approach and with the extension to other systems:t he ideas described here could be transferred to other tetrahedral networks such as the homologous material germanium or the highly complex amorphous forms of carbon [7b] (on which work is ongoing), or even to crystalline,amorphous,and liquid states of water. [9b, 24]