Building Materials Genome from Ground-State Configuration to Engineering Advance

Individual phases are commonly considered as the building blocks of materials. However, the accurate theoretical prediction of properties of individual phases remains elusive. The top-down approach by decoding genomic building blocks of individual phases from experimental observations is non-unique. The density functional theory (DFT), as the state-of-the-art solution of quantum mechanics, prescribes the existence of a ground-state configuration at zero K for a given system. It is self-evident that the ground-state configuration alone is insufficient to describe a phase at finite temperatures as symmetry-breaking non-ground-state configurations are excited statistically at temperatures above zero K. Our multi-scale entropy approach (recently terms as Zentropy theory) postulates that the entropy of a phase is composed of the sum of the entropy of each configuration weighted by its probability plus the configurational entropy among all configurations. Consequently, the partition function of each configuration in statistical mechanics needs to be evaluated by its free energy rather than total energy. The combination of the ground- and non-ground-state configurations represents the building blocks of materials and can quantitively predict free energy of individual phases with the free energies of ground- and non-ground-state configurations predicted from DFT, plus all properties derived from free energy of individual phases.

It is evident that the author's "Materials Genome"® was originally used to denote the individual phases as the building blocks of materials in accordance with the CALPHAD method.The input data for CALPHAD method includes thermochemical data and phase equilibrium data.In principles, thermodynamic modeling could be performed with only thermochemical data as they are the derivatives of free energy, and their integrations give the free energy of each phase.However, most of thermochemical data are derived from measurements of heat that are with large uncertainty, and Gibbs energies of individual phases are thus not accurate enough for accurate prediction of transition temperatures between phases and compositions of phases in equilibrium with each other.Consequently, the Gibbs energy model parameters of all phases need to be refined simultaneously using experimentally measured phase transition data, which limits the predictive power of CALPHAD databases for discovery of new materials.
The natural next step is to seek approaches to accurately predict the free energies of individual phases, so that the phase equilibrium data are not needed as they are not available for new materials to be discovered or designed.In this brief overview, the efforts from the author's group in last fifteen years in searching for building blocks for individual phases are discussed 4 .From the viewpoint of statistical mechanics, an individual phase can be considered as a statistical mixture of various configurations that the phase as a system experiences in accordance with statistical mechanics developed by Gibbs 39 .Starting from the ground-state configuration of a system in terms of DFT at zero K, the system experience non-ground-state configurations at temperatures above zero K.With the free energies of these configurations predicted from the DFT-based calculations, it is shown that the phase transitions and related property anomalies can be calculated accurately, showing remarkable agreement with experimental observations for magnetic [40][41][42] and ferroelectric materials 43 , including singularity at critical points.Consequently, those configurations can be defined as the building blocks and used to predict unknown properties of individual phases in terms of the statistical mechanics.Furthermore, the free energies of individual configurations should be used in calculations of their partition functions since those configurations are not pure quantum configurations.The key challenge in this approach is to sample all configurations that the phase experiences under the experimental conditions.

Effective Hamiltonian approaches in the literature
DFT 44,45 is the state-of-the-art solution of the multi-body Schrödinger equation 46,47 with several approximations 13,18 .It represents the outcome of the many-body interactions involving both the nuclei and the electrons by a set of one-electron Schrödinger's equations, one for each valence electron.It articulates that there is a ground-state configuration at zero K in each system with the lowest energy that is defined by a unique electron density.The first set of internal degree of freedom (DOF) of the system is the deviation of electron density away from the ground-state electron density as discussed by Kohn and Sham 45 in calculating the free energy of the system using the Mermin formalism.The second set of internal DOF is the phonons due to the displacement of atomic nuclei or lattice vibrations, which can be calculated by either supercell method or linear response theory 11,48,49 .Both electron and phonon DOF preserve the symmetry of the ground-state configuration.
It is important to note that building on the local density approximation (LDA), Perdew and co-workers developed the generalized gradient approximation (GGA) [50][51][52] , in which the exchange-correlation energy is treated as a function of both the local electron density and its gradient, resulting in more accurate predictions of electronic structure and the energy of the ground-state configuration.Their latest strongly constrained and appropriately normed (SCAN) meta-GGA 53,54 with quantitatively correct ground-state results considers the symmetry breaking for some systems regarded as strongly correlated 55,56 .
At finite temperatures, a system experiences both symmetry-preserving and symmetrybreaking configurations as stipulated by statistical mechanics 39 , while an experimental measurement samples the response of the statistical mixture of all configurations with respect to external stimuli.The common approaches to treat internal symmetry-breaking DOF in the literature are to construct an effective Hamiltonian (eH) and evaluate the model parameters by fitting to DFT-based calculations of the ground-state configurations and some selected symmetry-breaking configurations followed by Monte Carlo (MC) or molecular dynamic (MD) simulations to sample the statistical mixtures of configurations and average their properties.
These approaches inevitably introduce errors in both the selection of eH formalisms and the truncation and the fitting parameters of eH.In most cases, microscopic DOF is evaluated for one given configuration in terms of local occupation with the Ising model, magnetic spins with the Heisenberg model, or the electric dipoles with the Landau theory.Coupling between different types of DOFs is not included automatically, but accommodated as additional, specialized terms 42 .Furthermore, each snapshot in MC/MD simulations representants one statistical mixture of ground-state and non-ground-state configurations under given external constraints.It is possible that not all statistical mixtures of configurations are sampled due to the limited simulation time scale.These limitations prevent quantitative predictions in comparison with experimental observations.There are approaches that directly couple DFT with MD and MC such as ab initio molecular dynamics (AIMD) [57][58][59] and quantum Monte Carlo (QMC) [60][61][62][63][64] with reduced errors, but more limited simulation time scale.

Zentropy theory: Coarse graining of entropy as a quantitative predictive approach
To improve the agreement between theoretical predictions and experimental observations, the author's team developed a multiscale entropy approach (recently termed as the zentropy theory) that considers both ground-sate and non-ground-state configurations of a system [65][66][67] .
Similar to discrete pure quantum configurations in quantum statistical mechanics 68 , the zentropy theory considers that a phase is statistically composed of discrete ground-state and non-ground-state configurations which are not pure quantum configurations.There are two key features of the zentropy theory.The first feature is that the free energies of all configurations are predicted from the DFT-based first-principles calculations.This feature is necessary to include the quantum contributions to entropy, which represent the intrinsic properties of each configuration.Another important aspect of the first feature is its capability to include coherently DOFs related to thermal electronic distribution, phonon vibration, local occupation, magnetic spin, and electric dipole through internal DOFs of individual configurations.The main challenge here is to the limitation on the supercell size in DFTbased calculations due to the constraint of current computing power.A minor challenge is the ergodicity of configurations.Both need to be systematically tested so for the predicted free energy of the phase is converged.
The second feature of the zentropy theory is to use the free energy of each configuration in calculating its partition function instead of total energy in statistical mechanics derived by Gibbs 39 and commonly used in the literature.This feature enables the complete counting of total entropy of the phase from its quantum scale to the experimental scale and maintain the quantum contributions to all scales.This originates from the fact that the ground-sate and nonground-state configurations used in DFT-based calculations are not pure quantum configurations, and their entropies are not zero at finite temperatures and must be included in order to be able to accurately evaluating the total entropy of the phase.As it will be shown below, this does not only affect the total entropy of the phase, but also changes the probability of each configuration.
The zentropy theory has been successfully applied to predict phase transitions in magnetic materials in last decade 40,41 and more recently in ferroelectrics 43 .In magnetic materials, the symmetry-breaking non-ground-state configurations can be constructed through spin flipping.
Under the consideration of collinear magnetic configurations, the total number of ergodic configurations equals to 2 !=  in a supercell with  magnetic atoms.The zentropy theory stipulates that the entropy of the system is the weighted sum of each configuration and the statistical entropy among configurations as follows 18,41,42,[65][66][67]69,70 Eq. 1 where  " and  " are the probability and entropy of configuration .The above equation represents the integration of the bottom-up approach from individual configurations, i.e., the first summation, and the top-down approach among individual configurations, i.e., the second summation, which is schematically shown in Figure 1.

System
Entropy: S configuration 1

Quantum mechanics in terms of density functional theory
It is noted that the statistical mechanics derived by Gibbs 39 only contains the second summation in Eq. 1, thus only part of the total entropy of the system unless  " = 0 for pure quantum configurations.Furthermore, it is important to point out that Gibbs considered "a great number of independent systems (states) of the same nature (of a system), but differing in the configurations and velocities which they have at a given instant, and differing not merely infinitesimally, but it may be so as to embrace every conceivable combination of configuration and velocities" 39 .He thus broadened the early statistical mechanics from the consideration of the particles of a system to independent systems (configurations of a system), i.e., each configuration of the system must be under the same external conditions as the system.In a canonical ensemble, each configuration and the system thus have the same mass (N), volume (V), and temperature (T), i.e., the same NVT.
Based on Eq. 1, the general formula of statistical mechanics under constant NVT can be written as follows Eq. 3 Eq. 4 where  and  are the Helmholtz energy and partition function of the system, and  " ,  " , and  " are the total energy, Helmholtz energy, and partition function of configuration , respectively.The key difference is the use of  " in Eq. 3 and Eq. 4 instead of  " as in the Gibbs statistical mechanics 39 , which implies  " = 0 for pure quantum state as mentioned above.It is thus self-evident that the properties of all configurations must be evaluated at the same NVT as the system because the statistical combinations of the configurations form the system.It is noted that similar formula was termed as "coarse graining of the partition function" [71][72][73][74] , though no actual calculations were reported in the literature using the formula by those authors.Similar approaches were also used by Wentzcovitch's group 75,76 as reviewed by the present author 18 .
Remarkable agreement between the zentropy-predicted and experimentally observed transition temperatures for a number of magnetic materials has been observed as reviewed previously 40,41 plus the more recent one on YNiO3 with strongly-correlated physics 18,42 .One of unique outcomes of the zentropy theory is the prediction of free energy at unstable states of a system which are between the stable and metastable states of the system, including the critical points predicted in Ce and Fe3Pt where the system changes from stable to unstable states resulting in bifurcation of the system into two inhomogeneous subsystems 40 .This represents the extreme of anharmonicity in a system that is usually represented by the deviation of entropy or heat capacity away from quasiharmonic behavior 77 .From Eq. 1, it can be seen that the first summation is the linear combination of entropies of individual configurations, and the emergent behaviors, i.e., the behaviors that none of the individual configurations possess, originate from the second summation in the equation.
For ferroelectric (FE) materials with spontaneous electric polarization, the definition of configurations was more challenging due to the strong dipole-dipole interactions that prevent many configurations through simple enumeration of all polarization directions.We explored the configurations of ferroelectric materials using PbTiO3 by means of ab initio molecular dynamic simulations (AIMD) 58 .PbTiO3 is one of the most extensively studied ferroelectric materials with the polarized tetragonal ground-state configuration and the macroscopically non-polarized cubic paraelectric (PE) phase above the transition temperature based on X-ray and neutron diffraction data 78 .On the other hand, both experiments 79,80 and AIMD simulations 58 demonstrated that individual Ti-caged unit cells exhibit polarized tetragonal configuration both below and above the FE-PE transition temperature.By following the trajectories of individual Ti atoms (see Fig. 17 in Ref. 40 ) and their motions (see video in Supplementary materials in Ref. 43 ) using the experimentally determined macroscopic lattice parameters, it was observed in the AIMD simulations that the polarized tetragonal Ti-caged unit cells switch their polarization directions more frequently with the increase of temperature 58 .This process creates more and more misoriented polarized tetragonal Ti-caged unit cells next to each other, resembling the well-known domain walls (DWs) in ferroelectric materials, but through dynamical switching between different polarization directions due to thermal fluctuations rather than freezing in statically.
DWs in ferroelectric materials are discussed extensively in the literature in terms of experimental and computational investigations 81,82 .Based on the experimental and computational results in the literature, there are two types of DWs for tetragonal PbTiO3, i.e., 90° and 180° DWs as twins on the (101) and (100) planes, respectively 83 , resulting in three

Configuration-based materials genome database
As discussed above, the significance of the statistical mechanics by Gibbs 39 is on the consideration of independent configurations that the system experiences with the same NVT as the system, substantially different from the consideration of individual particles in the system, which is not tractable for real materials systems.This is in analogy to the parable of the blind men and the elephant showing the study of the same complex problem from different perspectives and the importance to integrate their insights together 55 , i.e., seeing both a forest (top-down) and the trees in the forest (bottom-up).The key capability needed is thus to see the trees in the context of a forest rather than individual trees only, i.e., the possible configurations and their properties in a system, exactly as Gibbs envisioned when he created the statistical mechanics 39 .
Historically, knowledge has been primarily accumulated through observations and experimentations, followed by mechanistic understanding and development of fundamental laws.For complex phenomena, phenomenological and mechanical mathematical models were then established with the model parameters fitted to experimental observations and have been used to predict the macroscopic properties of systems, including materials.As any models are intrinsically incomplete and cannot fully represente the complexity underneath observations in general, it is inevitable that there are many different models that are continuously being improved over time along with more in-depth observations.Quantum mechanics developed in the 1920s 46,47 fundamentally changed our understanding of how nature works, and DFT developed in the 1960s enabled the digitization of quantum mechanics 44,45 .DFT starts from the opposite end of the temperature spectrum, i.e., zero K, and the unique ground-state configuration of a given system at zero K.The current mathematical and computational approaches have enabled the accurate prediction of the ground-state configuration of a system, its electronic structure, and associated properties, and more recently its free energy as a function of volume, temperature, and other internal variables.
The missing piece between the observations with multiple configurations and DFT with the ground-state configuration only is thus the non-ground-state configurations in observations that are not considered in typical DFT calculations.Since those configurations are in principle observable, they are metastable and not unstable, and their properties can thus be predicted by DFT in the same way as those of the ground-state configuration.It is self-evident that the ground-state configuration alone is not sufficient to reproduce the observations that also depend on non-ground-state configurations as stipulated by statistical mechanics.
One beauty of statistical mechanics is that the partition function of the system is a simple summation of the partition functions of independent configurations as shown by Eq. 3.
Consequently, the free energy of the system can be easily obtained as follows Eq. 5 The probability of each configuration can also be directly calculated arithmetically from Eq.

4.
Since the calculation of  does not involve any minimizations, the Helmholtz energy of the system thus obtained represent the Helmholtz energy landscape of the system as a function of internal and external variables of the system, including apexes and valleys.Under given external constraints, the minimization of the Helmholtz energy with respect to internal variables determines whether the system will be in a single-phase state or a multiple-phase state where the probabilities of configurations in all phases are different from each other, commonly referred as miscibility gap in the literature 22,23 .The point between the single-phase and multi-phase states is defined as the critical point where the macroscopically homogeneous single-phase state loses its stability and becomes unstable with the derivative of a potential to its conjugate molar quantity, i.e., the second derivative of the internal or free energy to the molar quantity, approaching from positive to zero.The physical properties of the system defined by the derivatives of a molar quantity to its conjugate potential, i.e., the inverse of the above stability derivative which is also the second derivative of free energy to the potential, diverge and become positive infinite.However, the divergence of properties between a molar quantity and a non-conjugate potential can either be positive or negative as the stability criteria do not prescribe their signs.The predictions of the critical points in Ce and Fe3Pt showed remarkable agreement with experimental observations, including the positive and negative divergences of thermal expansions, i.e., the derivative of volume to temperature, for Ce and Fe3Pt, respectively, i.e., the anti-INVAR and INVAR phenomena in these two materials 70 .
Another significance of the Helmholtz energy landscape is the prediction of free energy of the transition state along the pathways between neighboring states in the system, including the inflection points and the free energy of unstable states between the inflection points.
Particularly, the free energy at the apex point representing the free energy barrier of the transition, a critical value for the kinetics of the transition.It is important to point out that each configuration itself is stable, i.e., the derivatives between conjugate variables are positive for each configuration, and it is the statistical competition among all configurations that results in those derivatives of the system becoming zero at the inflection points and negative in the unstable states 18,23,40,41 .It is known that the free energy barrier of transition between two states is related to the interfacial and strain energies between them 84 .In the zentropy theory, the strain energy is taken into account by the requirement that the free energies of all configurations in Eq. 3 are evaluated under the same NVT, while the interfaces are built into individual configurations.For systems with defects such as vacancies, dislocations, grain boundaries or grain, they need to be treated as additional internal DOFs thus independent internal variables of free energy 18 .Such a free energy landscape can be used to predict the transport properties in terms of the theory of cross phenomena newly developed by the present author as shown in

Crossdiffusion
Based on the discussions above, the ground-state and non-ground-state configurations of a system are the fundamental building blocks of individual phases in the system, which further form the budling blocks for microstructures of materials.The collection of ground-state and non-ground-state configurations can be considered as the materials genome database for prediction of properties of individual phases and properties of materials.

Summary and perspectives
The present overview paper articulates that the ground-state configuration and non-groundstate configurations derived from the internal DOFs of the ground-state configuration of a system can be considered as building blocks of individual phases of the system.It is relatively simple to determine the non-ground-state configurations in magnetic and ferroelectric phases as demonstrated in our publications due to their straight-forward internal DOFs.Particularly for ferroelectric materials, the number of DW configurations is relatively few 81 .However, for phases of multiferroics, doped with other elements, or containing defects, the number of configurations can be very large, requiring more efficient approaches to predict their free energies.This is where artificial intelligence (AI) can play a very important role 85,86 so free energies of configurations can be efficiently predicted.Consequently, this configuration-based materials database can be used to accurately predict properties of materials based on the zentropy theory without experimental inputs and enable more efficient discovery of new materials and improvement of existing materials for emergent behaviors.New knowledge and data can thus be created and accumulated theoretically and validated by experiments to ultimately enable the full digitization of both cyber and physical spaces of any systems of interest as the core of 4 th industry revolution 87 .

Figure 1 :
Figure 1: Schematic representation of the zentropy theory with the top-down statistical unique configurations including the one without domain wall.Using the DW energies at zero K predicted by the DFT-based first-principles calculations in the literature83 , the macroscopic FE-PE transition temperature predicted by the zentropy theory shows remarkable agreement with experimental measurements43 .The author's group is currently calculating the Helmholtz energy of each configuration, anticipating better agreements between predicted and measured values.

Table 1 :
Cross phenomenon coefficients represented by derivatives between potentials,symmetrical due to the Maxwell relations with the off diagonal terms that can be either positive or negative18,41 .