Going Beyond the d‐Band Center to Describe CO2 Activation on Single‐Atom Alloys

The computational screening of single‐atom alloys (SAAs) is challenging owing to their complicated electronic structure compared with traditional metal or alloy catalysts and the consequent lack of an accurate and low‐cost activity descriptor to replace expensive adsorption energy calculations. Herein, a data‐driven approach involving consecutive classification and regression is explored to identify the descriptor in the form of algebraic operators representing atomic information for predicting the adsorption energies of CO2 molecules on SAAs. The best descriptor significantly outperforms the d‐band center model in terms of accuracy and computational overhead. This study provides a fundamental understanding of the bonding strength on SAA surfaces and also an effective approach for the high‐throughput screening of promising catalysts.


Introduction
Computational screening for novel and improved catalyst materials requires the precise and low-cost prediction of vital parameters, such as the adsorption energies of reactive species, which are crucial for understanding and simulating surface-related reactions. [1] However, adsorption energy calculations are not only time-consuming, but also rely on computational methods, and hence, may not meet the requirements of high-throughput material design. This has necessitated the development of effective approaches to predict adsorption energies with an accuracy analogous to that of previous approaches but which use simpler parameters to compute the target properties. For traditional transition metals (TMs) or alloy catalyst materials, the d-band center model, [2] standard thermodynamics scaling relation, [3] and BrønstedÀEvansÀPolanyi (BEP) relationships [4] have revealed a linear correlation of the d-band center, reaction energies, and activation energies with the bonding strength of reaction intermediates; these findings have significantly contributed toward improving the efficiency of modern catalyst design. [5] These approaches can draw its catalytic performance in terms of the simple electronic structure of the TM surface, allowing us to rapidly predict the adsorption energy using just one base parameter: the energy level of the d-band. Even though the d-band center of active site has proved to be extensively useful in the rational design of catalyst in chemical and electrochemical reactions, [6] it is still not an accurate reactivity descriptor for metal surfaces [7] and other complex material systems that deviate from traditional scaling relations.
Single-atom alloys (SAAs) are specific alloy classes and also a type of single-site catalyst, where few guest atoms disperse across the surface of the host matrix, such that no bonds are formed between neighboring guest atoms; this allows one to regulate the selectivity of the guest atom and activity of the host atom. [6] In principle, the effect of the isolation site on the SAAs may also break down the known limitations of traditional scaling relations, rendering the aforementioned d-band center not an exact descriptor. In addition, this unusual electronic structure with free-atom-like d-states of the guest atom may also result in SAAs with unprecedented catalytic properties. [8] To efficiently achieve rational design of SAAs, Monasterial et al. revealed that the guest atom's p-band center is superior to d-band center on the hydrogen adsorption strength. [9] Fung et al., proposed an analytical expression of methane adsorption energy with the number of occupied and unoccupied s-, p-, and d-states using statistical learning algorithm. [10] Nevertheless, these descriptor or features all rely on the theoretical simulations, are very time consuming, and method dependent. Recently, a data-driven screening technique, together with outlier detection methods, was also raised to identify the potentially optimal performance of SAAs, deviating from thermodynamic scaling relations by Dasgupta et al. [11] Even though this method may also generate some unexpected and promising SAA catalyst, the complexity and interpretability of model may also hinder its high-throughput screening, particularly for experimentalists. Hence, it is essential to develop a data-driven approach to identify the best-performing descriptor in the form of algebraic functions with just base parameters. [12] DOI: 10.1002/aesr.202100152 The computational screening of single-atom alloys (SAAs) is challenging owing to their complicated electronic structure compared with traditional metal or alloy catalysts and the consequent lack of an accurate and low-cost activity descriptor to replace expensive adsorption energy calculations. Herein, a data-driven approach involving consecutive classification and regression is explored to identify the descriptor in the form of algebraic operators representing atomic information for predicting the adsorption energies of CO 2 molecules on SAAs. The best descriptor significantly outperforms the d-band center model in terms of accuracy and computational overhead. This study provides a fundamental understanding of the bonding strength on SAA surfaces and also an effective approach for the high-throughput screening of promising catalysts.
As a representative example, we focus on the activation of CO 2 molecules from a linear free state to a bent chemisorption configuration in the first step of CO 2 reduction. CO 2 uptake is of critical importance to the subsequent catalytic conversion process. [13] Thus far, no accurate and simple parameter has been proposed to describe CO 2 activation on SAAs.
In this work, a descriptor for CO 2 activation on SAAs was developed via the sure independence screening and sparsifying operator (SISSO) algorithm, a compressed sensing approach with first-principles-based inputs. [14] By identifying the CO 2 adsorption energy descriptor with only atomic information as the primary feature, a two-step classification and regression methodology was constructed. From the classification, a 2D descriptor is adequate to separate the physical and chemical adsorption configurations. Finally, on the basis of the chemical adsorption energy database, we propose an explicit, analytical expression to quantitatively describe the correlation of the CO 2 adsorption energy with the primary features. The validation root-mean-squared error (RMSE) and maximum absolute error (MaxAE) of the 1D descriptors are 0.12 and 0.30 eV, respectively, as well as the best descriptor with 0.05 eV for RMSE and 0.21 eV for MaxAE, are all superior to those obtained by the traditional dband center model (0.21 and 0.57 eV, respectively) in terms of accuracy and computational overhead. Our study demonstrates the advantage of data analysis and also offers a novel approach that is accurate and low cost to describe the binding energies of reactive species, which will aid in the computational screening of promising catalysts.

Results and Discussion
The adsorption energies of CO 2 molecule (E ads ) for Co-, Ni-, and Cu-based SAAs are simulated within the framework of the density functional theory (DFT) theory. Among these, TMs are usually served as reactive catalyst materials in the CO 2 hydrogenation reactions. [15] The corresponding close-packed surfaces containing Co (001), Ni (111), and Cu (111) are selected as the host matrixes of SAAs. 28 kinds of TM elements, except Hg (which is toxic) from the periodic table, are served as the guest atoms of SAAs. Considering all nonequivalent high-symmetry sites close to the guest atom containing the top, bridge, facecentered cubic (fcc), and hexagonal-close-packed (hcp) hollow sites, the C═O bond of the CO 2 molecule is placed at these possible locations ( Figure S1, Supporting Information). Verification with all possible adsorption configurations demonstrates that there are six kinds of most stable adsorption models among the 84 datasets, as marked in Figure 1; details are shown in Figure S2-S4, Supporting Information. The five kinds of bent CO 2 models (②-⑥) are denoted as the chemisorption model, that is, "chem," and the remaining quasilinear CO 2 model (①) is denoted as the physisorption model, that is, "phys." Figure 1b shows the most favorable adsorption configuration for each SAA, indicating that the adsorption models with the "chem" form transform into the free state-like "phys" form, with increasing φ (the occupancy number of d-orbitals) for the guest atom from left to right. This phenomenon can be qualitatively understood by the classical chemical bond theory, whereby a variation in the bonding strength mainly arises from the antibonding state, and a lower d-band energy level (i.e., more occupied dorbital electrons) of active sites makes the antibonding state possess a lower energy level with more occupancy. This lowers the adsorption strength, increasing the likelihood of a "phys" form, and vice versa. In addition, the dissimilar "chem" configurations of CO 2 (O δÀ ÀC 2δþ ÀO δÀ ) can be partially attributed by the difference of electronegativity between the guest ("g") and host ("h") atom. If the electronegativity of "g" is smaller than "h," it may induce obvious charge transfer from "g" to "h," resulting in the positively charged guest atom and negatively charged host Figure 1. a) Optimal adsorption models (①-⑥) of CO 2 on different Co-, Ni-, and Cu-based SAAs. The blue, yellow, gray, and red balls represent the host, guest, carbon, and oxygen atoms, respectively. b) Stable adsorption configuration of CO 2 on different SAA surfaces, including the three different host matrixes. The roman numeral series of IB to VIII indicates the subgroup number of the guest atoms. Each column with the same roman numeral has three columns, representing the three kinds of host matrixes (Co, Ni, and Cu). If the corresponding guest atom in the three SAAs possesses the same adsorption model, we will only use one label. The notations 3d-5d represent the d-orbital shell of the guest atom. "/" indicates nonexistence.
www.advancedsciencenews.com www.advenergysustres.com atom. Therefore, O atom of CO 2 may bond to "g" atom, and C linking to "h" atom, similar to "③, ④, ⑤, and ⑥" models of Figure 1b. Interestingly, the guest atom at the "⑥" model obviously deviates from the basal surface after the capture of the CO 2 molecule, such as Sc@Co, Y@Co, Zr@Co, and Y@Ni system. These may be contributed by the larger covalent radius of the guest atom than host in Figure S5, Supporting Information, where the guest atom of the SAA surface also exhibits an obvious bias from the basal surface, further enhancing the coordination unsaturation of d-orbitals. In all, besides the occupied number of d-orbitals φ, the electronegativity and covalent radius of the guest and host atoms may also synergistically regulate the adsorption configurations. Figure 2 shows a heat map of E ads for the most favorable site, which also encompasses the pristine surface (where the guest atom is the same as the host matrix on the SAAs), as also marked in the histogram form of Figure S6, Supporting Information. For Cu-based SAAs, our results are further supported by those of Lu et al. [13c] On SAAs with an identical host matrix, the absolute values of E ads decrease from %1.30-0.30 eV with increasing φ (φ≤6) of the guest atom. Furthermore, with an increase in the P (periodic number) of the guest atom, the bonding strength of CO 2 is also enhanced. Although the disparate guest atoms embedding into the same host surface may generate various adsorption characters, the uniform guest atom on the different host matrixes with diverse φ would also lead to a variation in E ads , for example, Y@Cu with an E ads of À0.98 eV and Y@Co with an E ads of À1.28 eV. This implies that the CO 2 adsorption properties are not only related to the P and φ values of the guest atom, but also to the φ of the host atom. For a guest atom from the late TM series (φ > 6), the adsorption strength of CO 2 molecule is very low, with an E ads of %À0.20 eV. Consequently, the SAAs are more likely to retain a linear form of the free state, indicating that the involvement of a late TM has little effect on CO 2 activation. In all, there are 51 "chem" and 33 "phys" models based on the above calculations. In addition, the relationship between E ads and the O-C-O bond angle of the CO 2 molecule is considered in Figure S7, Supporting Information. All "chem" models exhibit a near-linear regression relation, particularly those for Cu-based SAAs.
To better elucidate the variation in E ads for different SAAs, we first investigate the correlation between E ads and the d-band center of the guest atom (E d ) in the entire database (Figure 3a). Compared with previous work, our method of calculating the d-band center that projects to the guest atom provides better correlations with other properties than calculating the d-band center  The SAAs inside the green-highlighted region include "phys" and "chem" models. The inset shows the distribution of the prediction error with the chemical adsorption data using the d-band center model. b) Classification with "chem" (red) and "phys" (blue) using CV5. The black line represents the separation of convex domains using the supportvector machine with a linear kernel. The hollow and solid circles represent the training and test data, respectively. www.advancedsciencenews.com www.advenergysustres.com projected on 1) the guest atom and its first adjacent host atoms or 2) the entire slabs. [16] When E d is less than %À2.0 eV, the "phys" model is prevalent in all SAAs, with the quasilinear form by an E ads of %À0.2 eV, while in the region above an E d of À0.5 eV, the SAAs are likely to capture CO 2 molecules with the bent configuration. However, the SAAs inside the green-highlighted region of Figure 3a with E d values in the range of À2.0À0.5 eV exhibit various "phys" models, as well as a few "chem" models. Therefore, it is difficult to accurately distinguish the "phys" and "chem" configurations using the same E d . If we adopt the entire chemical adsorption database, the changing trends for E ads and E d just match those of the d-band center theory qualitatively. The fitting correlation between E ads and E d yields E ads ¼ À0.26E d À 0.76, with an R 2 of 0.62. The RMSE and MaxAE are 0.21 and 0.59 eV, respectively. The distribution of the prediction error is also shown in the inset of Figure 3a; it is slightly on the higher side and could hinder the precise design of the material. According to the previous d-band center model, the higher the energy level of E d , the higher the binding energy of the CO 2 molecule. Nevertheless, it is evident that the provided linear correlation predicted by the d-band center model fails for CO 2 molecule uptake on SAAs. The model relations also do not hold for the adsorption of other species catalyzed by SAAs. [16a] Thus, this standard and simple d-band center correlation does not hold for the adsorption of CO 2 on SAAs. Moreover, the calculation of E d for each SAA is highly computationally demanding, considering the numerous candidates. This necessitates the development of a novel accurate and low-cost descriptor for the high-throughput screening of material design in SAAs. The SISSO is a state-of-the-art method to identify descriptors and can acquire explicit, analytical functions between the material parameters and target properties. Table 1 and Table S1, Supporting Information elucidate the elementary properties of the host (h) and guest (g) atoms for SAAs to serve as primary features of the SISSO method, including the occupancy number of d-orbitals (φ), periodic number (P), Pauling electronegativity (χ), first ionization energy (IE), electron affinity (EA), covalent radius (R), and dipole polarizability (DP); in all, there are 14 base parameters. We then determine the Pearson correlation for some simple descriptors with E ads as the target parameter, based on the entire database, and subsequently discuss the best-performing, more-complicated descriptor and predictive power for data points not contained in the training dataset. Table S2, Supporting Information, shows the correlation of primary features (Φ 0 ) with the E ads of our database. The top seven base parameters are all elementary properties of guest atoms; the most relevant is φ g . This implies that the inclusion of the guest atom plays a crucial role in determining E ads , especially φ g , which has also proved to be a significant factor for the bonding strength. [17] Table S3 and S4, Supporting Information, show the correlations for the top ten 1D descriptors identified by the first and second iterations of feature constructions (Φ 1 and Φ 2 ), respectively. Notably, these descriptors exhibit better correlation with E ads than the best primary feature (φ g ) and can be easily obtained via simple algebraic operations on the base parameters. For example, the best descriptor in the Φ 2 feature construction is j ln R g φ g j with a correlation coefficient of 0.9295, higher than 0.8875 for φ g . The increase in correlation leads to a lower RMSE and MaxAE, from 0.18 eV and 0.41 eV to 0.14 eV and 0.35 eV, respectively. This demonstrates that a descriptor with just one base parameter is difficult to entirely capture the material property of interest, and a combination of primary features is required.
To accurately describe E ads using the provided elementary properties, the SISSO method is employed, with the aim of proposing a precise yet low-cost descriptor via a two-step classification and regression procedure. Initially, the classification method is adopted to distinguish the "phys" and "chem" results using CV5 (fivefold crossvalidation), as marked in Figure 3b. This figure also shows a black line, representing data calculated by the support-vector machine with a linear kernel, to help visualize the separation of convex domains. SISSO can successfully distinguish "phys" (blue) and "chem" (red) configurations using the 2D convex map with jEA g À jEA g À EA h jj and φ g ðχ g ÃR g Þ descriptors on the basis of the second-feature construction (Φ 2 ). The fifth subsample was also verified to be distributed into both sides of the black line, which can be entirely sorted. As seen, the second descriptor φ g ðχ g ÃR g Þ plays a crucial role in the classification, especially for the primary feature φ g , due to the wide range from 1 to 10. Based on the aforementioned analysis, the electronegativity χ and covalent radius R may also synergistically influence the "chem" and "phys" form. If we use the entire database without CV5, we will also obtain the same descriptors to classify the "phys" and "chem" results perfectly in Figure S8, Supporting Information. This signifies that the clear and analytical expression provided by the SISSO approach has better generalization ability. Thus, the SISSO can accurately achieve the classification of the CO 2 adsorption model for SAAs.
The SISSO method can not only be used to identify explicit, analytical descriptors for target property classification, but also for quantitatively evaluating the E ads for SAAs via regression analysis based on the "chem" database. This dataset is divided into a training set (80% of the database) and test set (20% of the database), where descriptor identification is performed with CV5 using the training set. The validation error is defined as the average of the test errors obtained for each of the five subsamples in the training set, and the optimal descriptor is the formula with the least validation RMSE. Figure 4a and Table S5, Supporting Information, display the resulting training and validation RMSE for each combination of hyperparameters. Both, the training and validation errors, are generally reduced; then, when implementing higher dimensions and larger rungs, overfitting may also occur as evidenced by a flat or even slightly increased RMSE. The optimal dimensionality of the descriptor is 6D, with a validation RMSE of 0.05 eV and MaxAE of 0.21 eV in the Φ 2 feature space. The corresponding descriptor and its correlation with E ads are shown in Table S5, Supporting Information. Although the 6D Φ 2 descriptor possesses the best accuracy, it is extremely complex and not convenient for high-throughput screening, especially for experimentalists working on material design. In the Φ 2 feature space, the 1D descriptor is with a validation RMSE of 0.12 eV and MaxAE of 0.30 eV. This descriptor is relevant to the periodic number of the guest atom (P g ) and occupation number of d-orbitals for the guest (φ g ) and host atoms (φ h ). This 1D Φ 2 descriptor offers specific interpretability and is convenient to derive the physical significance. Interestingly, the explicit, analytical function can also be obtained using all chemical adsorption energy data as training data in Figure S9, Supporting Information, indicating the generalization ability of this descriptor. This descriptor can also support our fundamental understanding of the bonding strength on SAA surfaces.
For the 6D and 1D Φ 2 descriptors, we also display a violin plot of the absolute error distribution using the remaining test set and compare it with that obtained using the traditional d-band center descriptor with the same dataset in Figure 4b. Most errors (75%) are below 0.14 eV for the 6D and 0.26 eV for the 1D descriptors, and the MaxAEs are only 0.20 eV and 0.27 eV, respectively. In comparison with E d , we also perform the corresponding CV5 using 80% of the database as a training set to obtain the linear correlation of E ads with E d , leading to an average validation RMSE of 0.21 eV and MaxAE of 0.57 eV, which are higher than those with the 6D and 1D Φ 2 descriptors. In addition, most errors for the test set are below 0.30 eV (upper blue density plot of Figure 4b), which is thus poorer than the 6D Φ 2 as well as 1D Φ 2 descriptors. In Figure 4c-e and Figure S10, Supporting Information, we compare the DFT-calculated E ads and SISSOor E d -predicted results, demonstrating that the SISSO algorithm has a smaller bias than the traditional d-band center model regardless of the training and test datasets, especially for the more complicated 6D Φ 2 descriptor. Combined with the fact that the d-band center requires numerous calculations, whereas the SISSO approach only utilizes a few elementary properties of the atom as primary features to identify the related descriptor without redundant calculations, the SISSO-generated descriptor is superior to the traditional E d in terms of fidelity and computational overhead. . a) RMSE for descriptor identified using four-fifths of the data for training and the remainder for validation via CV5. b) Violin plot of the absolute error distribution for the 20% test database. The internal solid line represents the median; the internal dashed line, the mean; the black box, the 75% and 25% percentiles; the whiskers, the 95% and 5% percentiles; and the red and blue regions, the density plot. c-e) Comparison of DFT-calculated versus SISSOpredicted E ads using Φ 2 6D and Φ 2 1D, and the d-band center model-predicted E ads . The hollow and solid circles represent the training and test data, respectively.