Machine Learning Design of Perovskite Catalytic Properties

Discovering new materials that efficiently catalyze the oxygen reduction and evolution reactions is critical for facilitating the widespread adoption of solid oxide fuel cell and electrolyzer (SOFC/SOEC) technologies. Here, we develop machine learning (ML) models to predict perovskite catalytic properties critical for SOFC/SOEC applications, including oxygen surface exchange, oxygen diffusivity, and area specific resistance (ASR). The models are based on trivial-to-calculate elemental features and are more accurate and dramatically faster than the best models based on ab initio-derived features, potentially eliminating the need for ab initio calculations in descriptor-based screening. Our model of ASR enables temperature-dependent predictions, has well calibrated uncertainty estimates and online accessibility. Use of temporal cross-validation reveals our model to be effective at discovering new promising materials prior to their initial discovery, demonstrating our model can make meaningful predictions. Using the SHapley Additive ExPlanations (SHAP) approach, we provide detailed discussion of different approaches of model featurization for ML property prediction. Finally, we use our model to screen more than 19 million perovskites to develop a list of promising cheap, earth-abundant, stable, and high performing materials, and find some top materials contain mixtures of less-explored elements (e.g., K, Bi, Y, Ni, Cu) worth exploring in more detail.

operating temperatures of about 500 °C or even lower. 4,5,13Perovskite oxides are the most popular and well-studied non-precious metal ORR/OER catalysts for current and next generation SOFC/SOEC technologies.Computational discovery of new perovskite electrodes has traditionally centered on the use of first-principles based descriptors of catalytic activity, such as the O p-band center descriptor obtained from density functional theory (DFT) calculations. 145][16][17][18][19][20][21][22][23][24] However, even the use of descriptors like the O p-band center rely on modestly expensive DFT calculations which places constraints on the speed with which one can propose new materials or understand trends in materials properties, while the use of data-driven machine learning (ML) approaches provides a promising avenue for accelerating both understanding and discovery of new promising materials.
The use of ML approaches in materials science has seen a meteoric rise in recent years. 25- 29 5][36][37] The O p-band center property correlations mentioned above can be considered a primitive ML model, where the model has a single feature, the O p-band center, and the model type is often a basic univariate linear regressor.It is likely that more sophisticated data-driven techniques can be utilized for understanding and discovering new high-performing SOFC/SOEC materials.Recently, Xin highlighted the opportunity that ML techniques have in advancing the state-of-the-art of general catalyst design. 38More specifically for SOFC/SOECs, recently Zhai et al. 34 used a database of 85 area specific resistance (ASR) values, a neural network ML model, and elemental features together with a newly-proposed ionic Lewis acid descriptor, to screen for promising new perovskite materials.Zhai et al. enumerated a set of 6871 possible compositions and selected four promising materials based on low predicted ASR values, and experimentally confirmed these materials are indeed high performing, with ASR values on par or lower than the benchmark high performing material Ba0.5Sr0.5Co0.8Fe0.2O3(BSCF).In addition, recent work from Xu et al. 35 used a weighted voting regression approach combining multiple ML models to predict the oxygen conductivity in perovskite oxide electrolytes and suggest a number of doped gallates as promising electrolytes.
In this work, we take a purely data-centric ML approach to predicting perovskite catalytic properties and have three major results.The first major result is the development of ML models for perovskite catalytic property prediction and the demonstration of its superiority to the O pband center descriptor approach for such predictions.We show that ML models based on elemental features can deliver better average accuracy in predicting key properties than O pband center correlations, at least within our test set, and can be used to make predictions many orders of magnitude faster than O p-band center correlations as the ML requires no DFT calculation.The second key result of this work is the development and assessment of an ML model for ASR predictions.We show that our ASR model can realize low errors through a novel featurization scheme using a combination of elemental features, one-hot encoding of electrolyte type, and a separate ML model predicting the ASR activation energy barrier, which value is then used as a feature in the ASR prediction model.The inclusion of ASR energy barrier enables temperature-dependent ASR predictions.Further, our ASR model shows an ability to predict future promising materials based on temporal cross validation and provides calibrated uncertainty estimates (i.e., error bars).The third and final major result of this work is the application of the ASR ML model to search a large space of approximately 19 million (19M) perovskite compositions and identify new promising cheap, earth-abundant, stable and highly catalytically active perovskite materials.

Machine learning replaces electronic structure descriptors for catalytic property predictions
We use the database of perovskite catalytic properties from the work of Jacobs et al. 39 This database consists of 749 data points spanning 299 unique perovskite compositions.The data consists of oxygen surface exchange, diffusivity, and ASR values.In our previous work and similar descriptor-based studies, the O p-band performance are full fits to all values in a database.In ML studies, it is more common to assess model performance through cross-validation (CV), where certain subsets of the data are used to train the model, while the remaining data is held out to test the model performance (e.g., train on 80% and test on 20% of the data).In this section, we construct random forest ML models for each catalytic property at T = 500 °C and use random 5fold CV to assess model errors.All ML model fits and evaluations were performed using the MAterials Simulation Toolkit for Machine Learning (MAST-ML) package. 40All models use combinations of elemental properties as their features, which are nearly instantaneous to determine for a new compound.More details of the ML model, features, and fitting are given in Section S1 of the Supplementary Information (SI).To compare to the performance of O p-band center discussed in our previous work, we also assess model errors of a linear model fit to the O p-band center using random 5-fold CV. center and linear regression are 34.7%,29.1%, -6.6%, -1.5%, and 27.0% for k*, D*, kchem, Dchem, and ASR, respectively.Therefore, for all properties, the average MAE is within the uncertainty or reduced for the case of ML with elemental features compared to O p-band center with linear regression.This analysis demonstrates that even with limited dataset sizes of order 50 data points, ML can produce prediction errors on par or lower than those obtained using a physicallymotivated descriptor such as the O p-band center.In addition, the use of ML here has the advantage that predicting properties of new materials is orders of magnitude faster than performing DFT calculations to obtain the O p-band center or conducting experiments.

ML model of ASR with low errors, well-calibrated uncertainties, and effective timedependent materials predictions
Figure 2 details our ML model performance for predicting ASR.We found that the most effective ASR model is featurized using elemental features, one-hot encoding of electrolyte type, and a feature consisting of a separate ML model prediction of the Arrhenius energy barrier for ASR, which itself uses only elemental features and one-hot encoding of electrolyte type (see Section S1 of the SI for more details).Figure 2A and Figure 2B contain parity plots of a random forest ML model fit to all the data (full fit) and assessed by random 5-fold CV, respectively.In these parity plots, the blue points are materials which contain 4 or fewer independent experimental measurements, and the green points are designated "well-studied" materials with greater than 4 measurements.The separation of well-studied materials is done because our previous work 39 showed that the properties for these materials are much more amenable to fitting, likely due to their having reduced noise from averaging the multiple measurements.The listed metrics in black text are taken over all the data, while the metrics listed in green text are for the subset of well-studied materials only.The error bars on the points in the parity plots are calibrated uncertainty estimates from the ML predictions.Because our random forest model is an ensemble of decision trees, we can obtain an uncertainty on each prediction by calculating the standard deviation of the predictions of the individual trees.This approach provides a simple ensemble estimate of the prediction uncertainty, but one cannot tell a priori if this uncertainty estimate itself is accurate.We follow the approach of Palmer et al. 41 to develop calibrated uncertainty estimates and demonstrate that these calibrated uncertainty estimates are quite accurate.More information on the error bar assessment and calibration is in Section S3 of the

SI.
From our full fit model in Figure 2A, our ML model is able to accurately fit our ASR database, and the calibrated error bars tend to intersect the y = x line, which represents perfect prediction.From our 5-fold CV results in Figure 2B, the quantitative prediction quality is reduced compared to the full fit, with some amount of overestimation (underestimation) of the true ASR values for the lowest (highest) ASR values in our database.The overestimation of ASR at the lowest range of values does not present an issue from the standpoint of materials screening, as it suggests the ASR predictions may be conservatively higher than the resulting true values, thus minimizing the likelihood of undesirable false positive predictions.It is worth noting that the ML prediction metrics for the subset of well-studied materials are significantly improved compared to predicting on all of the data.This result is consistent with our previous observation of O pband center trends in from our previous work and suggests that the ML model fit is sensitive to the quality and noise level of the experimental data used. 39 Figure 2C, we consider the problem of using our ML regression model for the purpose of classifying whether a particular material is expected to have a log ASR value below a given threshold.Using the 5-fold CV results in Figure 2B, we find that our ML regression model can correctly find 92% of materials that should have a log ASR < 1 Ohm-cm 2 at 500 °C, which represents a relatively high performing material.This result is encouraging as we will use our ML model to screen for new promising perovskite catalysts.Going beyond basic 5-fold cross validation, our model also shows effectiveness at predicting future promising materials.This is shown through temporal cross validation in  52unique compositions).We find similar results for an ASR model trained at 800 °C with a log ASR classification threshold of -1 Ohm-cm 2 (see Section S2 of the SI).We note that using our 800 °C ASR model trained on only the 23 data points (14 unique compositions) from the first tranche of 1998-2003 and predicting materials that are cheap (< $10/kg) and reasonably stable (< 125 meV/atom) (see discussion below for more details) suggests promising compositions in the Ba-Sr-Fe-Co space, indicating this initial model could have been used to predict materials similar to BSCF to be promising prior to its initial discovery in 2004.It is impressive to note that this model also suggests that doping Zr on the B-site should result in stable and highly active materials, long before the initial publication of BFCZ20 in the year 2013.In (A) and (B), the metrics listed in black are assessments on all of the data, where the +/-values are the standard deviation over all CV splits, and the metrics listed in green are assessments on the subset of 18 well-studied materials with greater than 4 experimental measurements.
Here, we compare the performance of our ML model for predicting ASR with that from recent work by Zhai et al., 34 who used a database of 85 ASR values and a neural network ML model employing similar elemental features as used here, together with a newly-proposed ionic Lewis acid descriptor, to screen for new perovskite materials based on low predicted ASR values.
Zhai et al. obtained a log ASR RMSE of 0.336 Ohm-cm 2 for data at 700 °C.We note that this RMSE of 0.336 Ohm-cm 2 is lower than our current random forest model on our ASR database, which yielded an RMSE of 0.590 +/-0.080Ohm-cm 2 for data at 500 °C.However, this comparison uses different ASR data, a different model, and a different feature set, so this comparison cannot be used to assess the relative merits of the two approaches.To allow a more direct comparison, we use a random forest model and our feature generation scheme (e.g., the ionic Lewis acid descriptor used in the work of Zhai et al. is not included) and perform 5-fold CV on the same ASR dataset used in the work of Zhai et al.From this analysis, we obtain an RMSE of 0.367 +/-0.049Ohm-cm 2 .The RMSE of 0.336 Ohm-cm 2  The second approach is more data-centric and involves providing a full suite of trivial-to-calculate elemental features and letting the ML model select the most important features.There are pros and cons to both approaches.The first approach utilizes physical understanding of how the features relate to the target property and therefore may be better able to represent target data with fewer features than the data-centric methods.This property of physics-based features is likely to be particularly advantageous for smaller datasets, where there may not be enough data for the ML model to select the most effective features in the more data-centric approaches.Furthermore, physics-based features often provide physical insight, e.g., by examining how the target depends on the feature.However, physics-based features are often harder to develop, as they require significant domain expertise, and often harder to implement, as they can require more work to calculate for new test data (e.g., the Lewis acid descriptor described above requires knowledge of the oxidation state of each element in the material, which must be estimated based on additional chemical models).In contrast, the datacentric approach uses a large suite of tabulated elemental features that can be developed with no domain knowledge, are easy to generate, and are easy to implement on new test data.The large number of features may require a relatively larger dataset than the physics-based approaches to extract the important features and their relationships to the target.However, a large set of simple elemental features may be advantageous compared to physics-based features on large data sets as they provide enormous flexibility and do not presuppose any particular physical mechanism.Even if the elemental features produce an excellent ML model, it may be difficult to establish physical understanding of why the features influence the target property in certain ways or develop useful mechanistic understanding from the ML model.Furthermore, having more features will slow down model fits, which may present an issue for some use cases.We tested the addition of the Lewis acid descriptor in our present approach and found that our random forest model did not find this feature to significantly impact the results (i.e., it had a low feature importance), thus the model performance was unchanged.We attribute the low feature importance of the Lewis acid descriptor in our case to be the result of the large set of elemental features available for the model, which effectively contribute the same information as the Lewis acid descriptor.Therefore, this result does not imply that the Lewis acid descriptor does not contribute important physical information.In fact, we performed a test using a feature set consisting of only the ML-predicted ASR barrier, A-and B-site Lewis acid descriptors, ML-predicted O p-band center, and one-hot electrolyte encoding, for a total of just 8 features.We find a 5-fold CV MAE, RMSE, and RMSE/σ of 0.463 +/-0.047Ohm-cm 2 , 0.610 +/-0.061Ohm-cm 2 , and 0.575 +/-0.079,respectively (the +/-is standard deviation over 25 splits).These 5-fold CV average values are slightly higher than our best model shown in Figure 2, but are within the CV sampling standard deviation and demonstrates the power of only a handful of physicallymotivated features to create a simple and quite accurate model.It is worth noting that if the Aand B-site Lewis acid descriptors are removed, the 5-fold CV model error increases modestly, where the MAE, RMSE, and RMSE/σ of 0.509 +/-0.042Ohm-cm 2 , 0.656 +/-0.062Ohm-cm 2 , and 0.615 +/-0.057,respectively (the +/-is standard deviation over 25 splits), indicating the addition of the Lewis acid descriptor is helpful for forming an accurate ASR model if one is interested in using a small, physically-motivated feature set.Overall, these results suggest that the simple elemental features used here are as good or better than the Lewis acid feature from Zhai et al.
for prediction of ASR, although the Lewis acid feature represents a much more physically intuitive and economical featurization.
To extract some understanding of how the features explored in this study relate to the target and develop more physical intuition from the ML models, we here examine the relative importance of the input features in predicting ASR using the SHapley Additive exPlanation (SHAP) approach 42 with a random forest model, as shown in Figure 3.Briefly, SHAP is a mathematical method of explainable machine learning which uses game theoretic principles to assess the contributions of each feature in a model prediction, thus providing both ranked feature importances and trends of the target variable with each feature.
In Figure 3A, we examine the SHAP ranking for the model discussed in the preceding paragraph, which integrated the Lewis acid descriptor with elemental feature-based models of ASR energy barrier and O p-band center, and one-hot electrolyte encoding.There are clear physical trends of these features with the resulting ASR values.For example, higher values of activation energy tend to coincide with materials with higher ASR values.This trend makes intuitive sense, because for ASR the Arrhenius scaling relationship is ASR ∝ exp(Ea/kT), where Ea is the activation barrier, k is the Boltzmann constant and T is the absolute temperature, where larger Ea produces a larger ASR, because ASR increases with decreasing T. In addition, a lower (higher) A-site (B-site) Lewis acid value is indicative of a material with weaker metal-oxygen bonding, promoting lower ASR.For example, La 3+ and Ba 2+ have Lewis acid values of 0.343 and 0.194, respectively, and broadly, A-site Ba-based perovskites have lower ASR than A-site La-based perovskites).As another example for the B-site, Mn 3+ and Co 4+ have Lewis acid values of 0.513 and 0.666, respectively, and Co-based perovskites generally have lower ASR than Mn-based perovskites.It is worth noting that recent work from Xu et al. used knowledge of low Lewis acid strength Cs + cation on the A-site to suggest PrBa0.9Cs0.1Co2O5+ is a promising PCFC electrode material. 43In addition, higher values of O p-band center correspond to lower ASR values, again consistent with the fact that higher O p-band center corresponds to weaker oxygen binding in a perovskite material.Finally, the electrolyte encoding offers some useful insights as well.For example, materials with a zirconia or perovskite electrolyte tend to correspond to higher ASR values, while materials with a ceria electrolyte tend to correspond to lower ASR values.
In Figure 3B  to extract physical meaning from the trends with basic elemental features.However, as noted above, there simplicity and accuracy make them a very appealing option.

Screening new promising perovskite catalysts with machine learning
In this section, we use our ML model for ASR discussed above, together with calculations of material cost (calculated using the pymatgen 44 package) and separate ML model predictions of perovskite stability to screen for new promising perovskite catalysts.The stability model is made using a random forest model and elemental features to predict values of 2844 perovskite oxides measured as convex hull energy using the database from Ma et al. 21(who expanded and updated the database from Li et al. 45 , more details are given in Section S4 of the SI).In total, we enumerate a large search space consisting of up to 3 elements on the A-site and 4 elements on the B-site, covering 50 elements and totaling just over 19M (19,072,821) materials (see Section S5 of the SI for more details).To find new promising materials, we set our screening criteria to be values below a target threshold of cost, stability, and ASR activity.We set the threshold value of cost to coincide with the value of the commercial material La0.6Sr0.4Co0.2Fe0.8O3(LSCF), which is $133.67 per kg.For activity, we set the threshold of the log ASR value of LSCF (at 500 °C) of 1.33 Ohm-cm 2 , and the log ASR (at 500 °C) of representative top performing material Ba0.5Sr0.5Co0.8Fe0.2O3(BSCF), which is 0.21 Ohm-cm 2 .The threshold for stability is informed by the work of Zhai et al.From their study, their top-performing material, Sr0.9Cs0.1Co0.9Nb0.1O3(SCCN), was demonstrated to have stable operation at 550 °C for over 800 hours without any observed loss in performance.Our stability model predicts a value of 93.3 meV/atom for SCCN at 500 °C, which we use as the stability threshold for screening.The order under which each property is screened can impact the resulting lists at different stages of the screening (although the resulting list when all criteria are applied does not depend on the order of their application) and multiple orders are explored below.Throughout this analysis, we make predictions of ASR assuming a ceria electrolyte is used.We can use our ASR model and our list of predictions to make numerous material assessments and comparisons.First, we can examine the most favorable screened materials from each of our criteria of cost, stability and ASR value.From inspecting this list of screened promising materials, we find that the materials that are the cheapest, most stable, and most highly active are BaFe0.75Cu0.125Zr0.125O3($1.15/kg, log ASR at 500 °C = 0.12 Ohm-cm 2 ), BaFe0.5Co0.25Mo0.25O3(18.0 meV/atom, log ASR at 500 °C = -0.02Ohm-cm 2 ) and SrCo0.75Nb0.125Ta0.125O3(SCNT) (log ASR at 500 °C = -0.Since our ASR model uses the ML-predicted activation energy as a feature, we can predict ASR as a function of temperature, as shown in Figure 5.For the fits to our full ASR database in Figure 2 of the main text, we focused on using random forest models and evaluated the 5-fold CV fit quality of different feature sets.We compared the performance using (i) only elemental features, (ii) elemental features plus one-hot encoding of electrolyte type (as was done in the analysis of Figure 1 of the main text), (iii) elemental features, one-hot encoding of electrolyte type, and the ML-predicted ASR activation barrier.The ASR barrier was predicted using a separate random forest model trained using elemental features plus one-hot encoding.We found that adding the one-hot encoding of electrolyte and the ML-predicted ASR barrier had about a 10% improvement in average 5-fold CV metrics, as summarized in Table S1.   of our ML model using what we call a "residual vs. error" (RvE) plot. 26,41In order for the uncertainty estimates (i.e., predicted errors) to be accurate, the distribution of their values should match that of normalized residuals (i.e., actual errors), meaning the slope of the fit line in

S4. Additional details of stability database
To formulate a model for perovskite stability, we use the dataset of perovskite oxides from the work of Ma et al. 21This database contains DFT-calculated total energies of 2935 perovskites.After removing duplicate entries (the duplicate entry with higher DFT total energy was removed) and those entries with very high DFT energies (indicating the run did not converge), the number of entries amounts to 2844 materials.The stability (as energy above the convex hull) of each perovskite was calculated using pymatgen. 44More specifically, we calculate

Figure 1 Figure 1 ,
Figure 1 provides a summary of ML model assessment for each catalytic property at T = 500 °C, with comparisons to the linear model fit to the O p-band center.The corresponding plot for T = 800 °C is shown in Section S2 of the SI and shows the same qualitative conclusions.In Figure 1, the bars (error bars) denote the average (standard error in the mean) MAE over 25 splits of 5-fold CV.The blue bars are average MAE values for the case where the DFT-calculated O pband center is used as a single feature with a linear regression model.The green bars are average MAE values for the case of using trivial-to-calculate elemental features with the ML model.The percentage reductions in average MAE by using ML elemental features compared to O p-band

Figure 1 .
Figure 1.Machine learning model random cross validation assessment comparing performance of the DFT-calculated O p-band center descriptor with a linear model and a random forest model using elemental features.These assessments are for T = 500 °C.The units of k* and kchem are cm/s, the units of D* and Dchem are cm 2 /s, and the units of ASR are Ohm-cm 2 .The error bars are standard errors in the mean of the calculated MAE over 25 splits of 5-fold CV.The ML models of k*, D*, kchem and Dchem use only elemental features, while the ML model of ASR uses elemental features and one-hot encoding of the electrolyte type.

Figure 2D .
In Figure 2D, the classification metrics denote the ability of the model to correctly classify materials as having log ASR < 1 Ohm-cm 2 , where the training sets are tranches of materials grouped by year of their initial study, and the corresponding test sets are all materials occurring after the most recent year in the training data.For example, the data point for "1998-2008" means the training data are all materials first studied from 1998-2008, and the test data are all materials first studied from 2009-2021.Note that all future instances of a material initially studied in a given year are placed in the tranche corresponding to the initial year of study, so repeat compositions are not present in both the train and test sets.The results of Figure 2D show that our model reaches an F1 classification score of about 0.8 for predicting materials first discovered between 2009-2021 after only training up to materials first discovered by 2008, corresponding to a training set of just 76 data points (

Figure 2 .
Figure 2. Summary of ML model performance for predicting log ASR at T = 500 °C.(A) Parity plot of full-fit to all of the data, (B) 5-fold CV assessment, (C) ASR model classification accuracy for predicting materials with log ASR < 1 Ohm-cm 2 , (D) temporal cross validation classification assessment.In (A) and (B), the error bars are the recalibrated random forest ensemble error bars.In (A) and (B), the metrics listed in black are assessments on all of the data, where the +/-values are the standard deviation over all CV splits, and the metrics listed in green are assessments on the subset of 18 well-studied materials with greater than 4 experimental measurements.
quoted in Zhai et al. is within the CV sampling error bar of our present result, suggesting that the main effect of the different RMSE values between their study and the present work is likely the ASR dataset used for train/test as opposed to the model type or feature set used.It is helpful to compare the reduced RMSE/σ (σ = dataset standard deviation) instead of just RMSE values, in order to normalize the predictive performance by the spread of data used in the model.The standard deviation of the log ASR values used by Zhai et al. is 0.505 Ohm-cm 2 , producing an RMSE/σ value of 0.665.The log ASR database used for the fit in Figure 2 has a standard deviation of 1.083 Ohm-cm 2 , producing, from 5-fold CV, an RMSE/σ = 0.556 +/-0.056.By this measure our model has a modestly lower average reduced RMSE than Zhai et al.'s model.Insights into role of featurization and feature importances of the ASR ML model The work of Zhai et al. and the present work highlight two different featurization approaches for composition-based materials property prediction.The first approach involves the use of human effort and physical intuition to craft physics-based features like the Lewis acid descriptor or O p-band center.
The present work and work of Zhai et al. do not use purely physics-based or data-centric features, but instead leverage some combination of the two.The work of Zhai et al. relies predominantly on the use of physics-based features, though they also include some additional elemental features motivated by physical intuition.The present work relies predominantly on the use of data-centric elemental features, though we also include the one-hot electrolyte encoding and ML-predicted ASR barrier as a feature (note the ASR barrier model uses elemental features and one-hot electrolyte encoding as features), which means we also have some physics-based features in our work.

2 .
, we examine the SHAP ranking (showing the top 25 of 50 features) for our best model discussed in the context of Figure Similar to the other model, the ASR activation barrier is the most important feature.The trends with electrolyte type are also the same between the two models.For this model, nearly all of the top elemental features (denoted here as "{property_name}_{math_operation}") generally fall into the following groups of properties: features related to size and/or volume (

Figure 3 .
Figure 3. Summary of feature importances for our ASR model using the SHAP method.(A) SHAP feature ranking for ASR model constructed using physically-motivated features.(B) SHAP feature ranking of the top 25 features of our best performing elemental feature-based model from Figure 2.

Figure 4
Figure4contains violin plots showing the distributions of cost (Figure4A), stability (Figure4B), and predicted log ASR at 500 °C (Figure4C) as the screening criteria are successively applied, starting with the criteria being plotted in each case.From Figure4, we can see that 2,453,872, 1,393,424 and 2,135,396 materials separately pass the screening criteria of cost, stability, and ASR, respectively, which translates into 12.9%, 7.3% and 11.2% of the original 19,072,821 considered materials, with stability being the most stringent screening criterion.A total of 57,579 materials (0.30%) pass both the cost and stability screening, while 53,210 (0.28%) materials pass both the stability and ASR screening.Finally, 9135 (0.05%) materials pass all screening criteria.A spreadsheet containing the compositions, calculated costs, and predicted stability and ASR values for these 9135 materials is provided as part of the SI.

Figure 4 .
Figure 4. Violin plots showing distributions of screened materials where the first screening is (A) screened materials cost, (B) screened materials stability, and (C) screened ASR.The numbers above each distribution denote the number of materials passing the given screening combination.The high, middle, and low colored ticks denote the maximum, median and minimum of the distribution, respectively, while the black ticks denote the mean of the distribution.

Figure 5
shows predicted ASR values (solid lines) with calibrated error bars as a function of temperature together with experimental data points for the commercial material LSCF, benchmark high performing material BSCF, the best material from Zhai et al., SCCN, and the top selected materials from our screening: SZNCCu, KSmSCNT, and BiSYNC.In the low ASR regime, our ASR model has a tendency to predict higher than true ASR values (a slight conservative bias), where this bias is about 0.3 log units for LSCF, BSCF and SCCN averaged together.However, the data points are within the calibrated uncertainties of the ML model.Our screened materials SZNCCu, KSmSCNT, and BiSYNC are predicted to outperform BSCF and SCCN at 500 °C.In addition, they have lower activation barriers than BSCF, implying their performance will continue to outpace BSCF for temperatures below 500 °C, shown in Figure5via extrapolation to 400 °C.

Figure 5 .
Figure 5. ML-predicted ASR temperature dependence for key materials.The solid lines are ML predictions using our predicted log ASR value at 500 °C together with the ML model of predicted ASR barrier to scale the prediction to other temperatures.The error bars are the calibrated one standard deviation error bars from the ML model.Data points are experimental ASR values extracted from the database from Jacobs et al. 39 (LSCF, BSCF) and from Zhai et al. 34 (SCCN).

Figure S6 .
Figure S6.Machine learning model random cross validation assessment comparing performance of the DFT-calculated O p-band center descriptor with a linear model and a random forest model using elemental features.These assessments are for T = 800 °C.The units of k* and kchem are cm/s, the units of D* and Dchem are cm 2 /s, and the units of ASR are Ohm-cm 2 .The error bars are standard errors in the mean of the calculated MAE over 25 splits of 5-fold CV.

Figure S7 .
Figure S7.Temporal cross validation results for ASR model fit at 800 °C.

Figure
FigureS8should be one and the intercept should be zero.In FigureS8, we can see that the uncalibrated uncertainty estimates (grey points and fit line) significantly underestimate the true error for small residuals, and significantly overestimate the error for large residuals, resulting in a low slope of 0.50 and high intercept of 0.28.The data after calibration (blue points and fit line) show significant improvement, with a slope of 0.98 and intercept of 0.01.This result shows the error bars are more accurate after recalibration, though they are not perfect, and may be improved as additional data becomes available or new uncertainty estimate approaches are used.

Figure S8 .
Figure S8.Summary of the random forest uncertainty estimates and their recalibration.This model is for predicting ASR at 500 °C and uses elemental features, one-hot electrolyte encoding and ML-predicted ASR barrier as features.The average (+/-standard deviation) recalibration parameters are a = 0.42824 +/-0.10199 and b = 0.36342 +/-0.05895.
ABB'B''O3 materials and resulted in 78,780 compositions.The third set consisted of AA'A''BB'B''O3 materials and resulted in 15,356,796 compositions.The final set consisted of ABB'B''B'''O3 materials and resulted in 378,924 compositions.After removing duplicates obtained after combining the various subsets, the final list consists of 19,072,821 compositions.
seek to have a deep understanding of the physical trends of each elemental feature selected by our model, some useful trends can still be extracted.For example, the higher values of the feature generation, where the present approach using many elemental features can wellrepresent complex physical relationships, but at some cost of interpretability.While we don't producing a larger volume and more cubic perovskite lattices.While the trend of larger AtomicVolume_arithmetic_average corresponding to larger ASR may appear an exception to this, we observe this trend is consistent with the fact that moving from early to late transition metals (e.g., Ti to Co) results in smaller transition metal atomic volume, which is consistent with lower ASR values.Finally, the larger values of phi_arithmetic_average and phi_difference (note, here "phi" means work function) being indicative of lower ASR values makes sense again through the lens of O binding strength and our previous studies on correlating work function with O pband center.Namely, materials with high work functions are easier to electrochemically reduce than those with low work function, where the ease of reduction implies weaker metal-oxygen bonding, hence resulting in lower ASR values.Consistent with the above discussion, it is difficult BiSYNC) have very low predicted log ASR values at 500 °C of just -0.37, -0.33 and -0.25 Ohm-cm 2 , respectively.Third, we can use our model to evaluate other recently proposed promising materials from the work of Zhai et al. and compare them with the materials proposed here.Our model predicts all four promising materials from the work of Zhai et al. to have log ASR values lower than BSCF at 500 °C, consistent with experimental validation by Zhai et al. finding all four of these materials are indeed more active than BSCF.These 4 materials are SCCN,

Table S1 .
Summary of random forest ML model 5-fold CV performance for predicting log ASR at 500 °C using different feature sets.The quoted values are averages from 5-fold CV +/-standard deviation over 25 folds.
Section S2: Additional details of machine learning analysis