Using Feature‐Assisted Machine Learning Algorithms to Boost Polarity in Lead‐Free Multicomponent Niobate Alloys for High‐Performance Ferroelectrics

Abstract To expand the unchartered materials space of lead‐free ferroelectrics for smart digital technologies, tuning their compositional complexity via multicomponent alloying allows access to enhanced polar properties. The role of isovalent A‐site in binary potassium niobate alloys, (K,A)NbO3 using first‐principles calculations is investigated. Specifically, various alloy compositions of (K,A)NbO3 are considered and their mixing thermodynamics and associated polar properties are examined. To establish structure‐property design rules for high‐performance ferroelectrics, the sure independence screening sparsifying operator (SISSO) method is employed to extract key features to explain the A‐site driven polarization in (K,A)NbO3. Using a new metric of agreement via feature‐assisted regression and classification, the SISSO model is further extended to predict A‐site driven polarization in multicomponent systems as a function of alloy composition, reducing the prediction errors to less than 1%. With the machine learning model outlined in this work, a polarity‐composition map is established to aid the development of new multicomponent lead‐free polar oxides which can offer up to 25% boosting in A‐site driven polarization and achieving more than 150% of the total polarization in pristine KNbO3. This study offers a design‐based rational route to develop lead‐free multicomponent ferroelectric oxides for niche information technologies.


S2. SPECIAL QUASI-RANDOM STRUCTURE (SQS)
In this work, we use the special quasi-random structures (SQSs) [4,5] method to model the solid solution of K 1−x A x NbO 3 (A = Li, Na, Rb, Cs). SQSs method (first developed by Zunger et al. [4]) allows us to mimic random solutions without infinitely expanding the supercell and determine the most disordered structure among all inequivalent configurations for a given composition.
Due to our primary interest in A cation substitution, the structure was treated as a pseudo-binary system. Based on the parent structure of KNbO 3 tetrahedral, and orthogonal primitive cells, the integrated cluster expansion toolkit (ICET) code [6] are utilized to search for the optimal supercell that best depicts the random structure. Different super-cell configurations are considered and tested from 2×2×2 to 4×4×4. From these diverse configurations, the representative structures are chosen regarding two factors: the total number of atoms (for computational limits) and the Atomic Correlation Function.
Atomic pair correlation functions (APCF) are used to characterize the statistical arrangement of atoms. The most disordered structure by using atomic pair correlation functions [4,5] can be found, because APCF allows us to quantify the structural degree of random ordering. From the lattice algebra each site is assigned a pseudo-spin variable of S i = −1/ + 1 depending on which atom (A/B) is occupying the site. In addition, these sites are grouped into figures, f (k, m). Where k = 1, 2, 3, ., is the vertices and m = 1, 2, 3, ., is the spanning maximum distances (first, second, and third-nearest neighbors, etc). The sum of the products of spin variables over all sites related to the figure of the lattice gives the correlation function Π k,m .
Basically, APCF of fully random binary alloys considering 2-atom interactions is approximated to 2,m = (2x − 1) 2 = ideal where m indicates the mth shell of nearest neighbor atoms and x is the substituent composition. APCF are calculated based on Equation S1 shown below, where N is the number of cation lattice sites, D m is the number of pairs among atoms i and j within the mth shell, r i,j is the distance between atoms i and j and R m is the radius of the mth shell of sphere. The pair parameter S i takes the value +1 when i is an atom of type A and −1 for an atom of type B.  neighbor. Though this result may be considered short ranged, it is well known from the work of Zunger et al. [4] that interatomic electronic interactions only take place in short range.

S2
The calculated properties such as enthalpy of mixing, charge transfer, etc. depend mostly on local atomic arrangements. Thus, even with the correlation function of the generated structure matching to the second neighbor, we can assume that the structure represents a highly randomized solid solution.
Also, we note that for different A cation substitutes, modeled alloy systems were not separately generated but the representative SQSs are used. Since SQSs only consider the correlation of different types of atoms, regardless of chemical characteristics, in theory, creating a A 1−x B x NbO 3 pseudo-binary system structure then replacing the B site with Li, Na, Rb, Cs should all represent the random structure identically.  Generally, the introduction of cations with varying ionic radii into the KNbO 3 results in lattice distortions, which is linked directly to the formability of the structures. Hence to assess the possibility of solid solution perovskites forming with regard to different size S5 cations (radius r Li < r Na < r K < r Rb < r Cs ), we have investigated the Goldschmidt tolerance factor [7] as well as a recently proposed the new tolerance factor [8], employing the Shannon ionic radii [9].

S3. GOLDSCHMIDT AND NEW TOLERANCE FACTOR
Goldschmidt tolerance factor [7], defined as the following: where r A , r B , and r O are the ionic radii of A, B, and O ions, respectively. Empirical results from various data suggest that a perovskite is likely to form in the range of 0.76 < t < 1.11 [10]. Here, we note that coordination number of 12 and 6 are used for r A and r B respectively. Interpolated values of the radii are used for unobtainable radii data corresponding to the correct coordination numbers. Prior papers also follow the same process as in the Reference 11 and are tabulated in the Table S4. We also note that an arithmetic average of the two cations radius is used in case of the A-site radii (r A ) as the following equation: Form this classic descriptor, the new tolerance factor is suggested based on SISSO [8] to predict the stability. For this newly proposed factor, the likelihood of solid solution being a perovskite increases as t decreases. While t = 4.18 being the absolute limit for a perovskite to be formed. It is calculated using the following equation: where r A , r B , and r O , are the ionic radii of A, B, O ions respectively and n A is the oxidation state of A. Figure S2. Helmholtz free energy for mixing for tetragonal and orthorhombic K 1−x A x NbO 3 in the first and second row, respectively. Here, the skyblue colors represent a positive value meaning that the solution is hard to mix, while the red represents a negative value meaning the mixability is high.

S4. SOLID-SOLUTION THERMODYNAMICS
To assess the thermodynamic stability of K 1−x A x NbO 3 cation solid solution (A = Li, Na, Rb, Cs), we calculate the Helmholtz free energy of mixing for each composition according where ∆U and ∆S are the internal energy and entropy of mixing and T is the absolute temperature. The internal energy of mixing is then calculated using where E K 1−x AxNbO 3 , E KNbO 3 , and E ANbO 3 are the total energies of K 1−x A x NbO 3 , KNbO 3 , and ANbO 3 (A = Li, Na, Rb, and Cs), respectively. The configurational entropy is calculated by the approximated entropy of ideal mixing according to where k B is the Boltzmann constant.

S7
To understand thermodynamic stability of constructed solid solution models [12], based on the optimized structures, we calculated the Helmholtz free energy of mixing for 0 K, 150 K, 300 K, 450 K, and 600 K following equation S5. The calculated Helmholtz free energy is plotted as a function of temperature in Figure S2.  [14,15]. This is further corroborated in our spontaneous polarization analysis which will be described and discussed in a later section.

S5. STRUCTURAL ANALYSIS
Based on these SQSs, we performed constrained optimization calculations and represent the lattice constants and volume changes in Figure S3 and Figure 1c in the main text, respectively. Additionally, the vectorial distortion of Nb in NbO 6 octahedra are investigated with Bond-elongation following Reference16 and represented in Figure   x. From this figure we note that our SQS generated structures follow the Vegard's law adequately.
Show the generated structures are random alloys.  To quantitatively determine the extent of intra-octahedral distortion (i) within a single polyhedron for each cartesian direction in a given crystal structure, one can define a bondlength-based index (i.e., the bond elongation index, λ i ) as a generic measurement index for polyhedral distortions. Mathematically, the bond elongation index, λ i can be expressed as where ∆x i , ∆y i , and ∆z i are vector components of ⃗ l i , which is a vector from i atom at polyhedron vertex to the center of the polyhedron, in the formal x, y, and z Cartesian axes. S11  Absolute values of PCCs are reported.

S7. GENERATION OF VALIDATION SETS FOR THE BINARY K 1−x A x NBO 3 AL-LOYS
Data augmentation is a technique used to increase the numbers of data via modifying existing data or newly creating synthetic data from existing data. An important deviation from the actual data augmentation method is that while most data augmentation is actually carried out for increasing the training size in hopes of preventing overfitting, we have utilized the interpolated data as a test set to assess the accuracy of the SISSO generated descriptors for the binary K 1−x A x NbO 3 alloys only.
Hence, to evaluate the performance of the identified SISSO descriptor based on the 42 DFT data points, an additional 2,004 interpolated data points per phase (by assuming an almost linear behavior -Vegard-like (VL) relation -between the DFT data points in Figure 2 in the main text) have been generated using data augmentation concept.  Figure S7.