Discovery of stable surfaces with extreme work functions by high-throughput density functional theory and machine learning

The work function is the key surface property that determines how much energy is required for an electron to escape the surface of a material. This property is crucial for thermionic energy conversion, band alignment in heterostructures, and electron emission devices. Here, we present a high-throughput workflow using density functional theory (DFT) to calculate the work function and cleavage energy of 33,631 slabs (58,332 work functions) that we created from 3,716 bulk materials, including up to ternary compounds. The number of materials for which we calculated surface properties surpasses the previously largest database, the Materials Project, by a factor of $\sim$27. On the tail ends of the work function distribution we identify 34 and 56 surfaces with an ultra-low (<2 eV) and ultra-high (>7 eV) work function, respectively. Further, we discover that the $(100)$-Ba-O surface of BaMoO$_3$ and the $(001)$-F surface of Ag$_2$F have record-low (1.25 eV) and record-high (9.06 eV) steady-state work functions without requiring coatings, respectively. Based on this database we develop a physics-based approach to featurize surfaces and use supervised machine learning to predict the work function. We find that physical choice of features improves prediction performance far more than choice of model. Our random forest model achieves a mean absolute test error of 0.09 eV, which is more than 6 times better than the baseline and comparable to the accuracy of DFT. This surrogate model enables rapid predictions of the work function ($\sim 10^5$ faster than DFT) across a vast chemical space and facilitates the discovery of material surfaces with extreme work functions for energy conversion, electronic applications, and contacts in 2-dimensional devices.


Introduction
The work function is a fundamental surface parameter of a material that determines how much energy is required to extract an electron to a field-free region outside the surface; lower work functions facilitate electron emission.Work functions play a key role in technologies that require precise control of contact barriers such as printed and organic electronics.[1][2][3] Materials with low work functions are crucial for electron emission devices (THz sources [4,5] and fluorescent light bulbs [6]), electron sources for scientific instruments, [7,8] and highbrightness photocathodes.[9] Especially for thermionic energy converters (TECs), [10][11][12] discovery of thermally stable, ultra-low work function materials (less than 1.5 eV) would allow thermionic conversion of heat (> 1500 • C) directly to electricity with efficiencies exceeding 30% (typical thermionic and thermoelectric converters have efficiencies around 10%). Materials with high work functions play a key role in engineering the electron tunneling barrier in electronics (for example in dynamic RAM applications [13] and for contacts in modern 2D-based transistors [14]) as well as selective contacts in solar cells [15].The most commonly used materials for low work function applications that are chemically and thermally stable are compounds like lanthanum hexaboride [16,17] and thoriated tungsten [18][19][20][21] with a work function around 2.5 * p.schindler@northeastern.edu eV.For thermionic converters, extremely low work functions are required, which are achieved by sub-monolayer coatings of alkali metals (most commonly cesium) on metal surfaces.The resulting work functions are much lower than the work function of either metal or coating individually.This effect is due to the partial transfer of electron charge from the adsorbate to the substrate and the resulting formation of surface dipoles that lower the vacuum energy level near the surface.[22] Perovskite oxides containing transition metals have also shown promise for low work function applications down to 1.6 eV.[23] Alkaline vanadates (e.g., SrVO 3 ) have also shown promise in thermionic emission devices.[24] Coatings using cesium or barium combined with oxygen are well known to achieve ∼ 1 eV work functions in photocathode applications, for instance on III-V semiconductors or silver.[25][26][27] The anti-fluorite X 2 O family (where X is an alkali metal) are known to exhibit work functions down to 1.2 eV.[28][29][30] Diamondoids [31] and phosphorous-doped diamond thin films have shown similarly low work functions.[32] More recently, a work function of 1.01 eV has been achieved by electrostatically gating cesium/oxygen covered graphene, [33] which resulted in enhanced TEC efficiency.[34] The lowest experimentally measured work function of 0.7 eV has been obtained by inducing a surface photovoltage on Cs/O 2 coated Gallium Arsenide.[35] The lowest theoretically predicted steadystate value to date is 0.7-0.8eV for potassium adsorbed on monolayers of MoTe 2 or WTe 2 .[36] Large work function materials are required for application in hole extraction/injection layers in organic solar cells and light emitting diodes.Thermally evaporated transition metal oxides are commonly used for this purpose, specifically MoO 3 , WO 3 , V 2 O 5 , NiO, and CrO 3 , with work functions ranging between 5 and 7 eV.[37][38][39] In many cases, organic coatings are used to increase the work function such as poly-styrene sulfonate, [40][41][42], conducting polyaniline polymers, [43] fluorinated self assembled monolayers [44], and benzoic acid trifluoromethyl [45].The largest increase in work function was reported for silver incorporating L-cysteine and Zn(OH) 2 with a boost from 4.3 to 7.4 eV.[46] Transparent conductive oxides such as indium tinoxide and similar derivatives have work functions reported up to 6.1 eV.[47][48][49][50][51] In metal-insulator-metal devices high work functions are crucial to increase the Schottky barrier-height and hence reduce leakage currents.Ruthenium oxide with a work function of 6.1 eV has been established in these devices.[52] Recently, high work function materials such as MoO x (x < 3) and RuCl 3 have garnered attention for establishing low resistance contacts in 2D devices, with reported work functions of 6.6 and 6.1 eV, respectively.[53][54][55] In recent years, data-driven approaches based on highthroughput ab-initio calculations have emerged as a new paradigm to facilitate the search through vast chemical spaces for new materials with tuned properties or novel behavior.Due to the continued increase in computing power and improvements of theoretical methods, the accuracy of predicted material properties has reached a reliability comparable to experiments while in some cases greatly surpassing them in terms of speed and cost.The rapid increase in available computational data structured in open source material databases such as Materials Project (MP), [56] AFLOW, [57] and NOMAD [58] has opened an avenue towards material discovery using data-mining and statistically driven machine learning approaches.However, most big material databases largely lack to report surface properties like the work function as each bulk material typically has dozens of distinct low-index crystalline surfaces and terminations.Each unique surface must be generated and calculated separately increasing the complexity and computational effort required.In the MP database this has only been done for ∼ 100 polymorphs of elemental crystals.[59] The largest computational surface catalysis database (>1 million surface adsorption calculations), the OC20, does not report on the work function or surface energy.[60] Based on several thousand newly predicted 2D materials [61] there are two other databases [62][63][64] that report ab-initio work functions.It is straight-forward to calculate the work function for 2D materials as they typically have only one relevant surface.The JARVIS-DFT and the C2DB databases have work functions calculated for about 700 and 4000 2D surfaces, respectively.The work function distributions for these 2D material databases are plotted in Figure S1 and the distribution metrics and a list of the 2D materials with the lowest and highest work functions is compiled in Table S1.Some statistical analyses have been carried out in literature showing that the electronegativity is linearly correlated with the work function both for elemental crystals and binary compounds.[59,65] Additionally, for elemental crystals an inverse correlation with the atomic radius is pointed out.The work function of elemental crystals ranges between 2 and 6 eV (for Cesium and Selenium, respectively).The statistical analyses of about 30 binary compounds shows that a correlation between the electronegativity of the atom with the lower electronegativity is the strongest (better than arithmetic or geometric mean of the individual electronegativities). Density functional theory has been a well-established approach (using a slab configuration) to calculate the work function, similar to the more simplistic Jellium model.[66] Also a phenemonolgical model has been developed that is able to estimate the work function fairly accurately for metals and alkaline-metal coated surfaces.[67] This phenomenological equation is a function of the atomic radius and the number of atomic sites per unit cell area.However, it relies on a single parameter (loosely related to the number of electrons that an atom can donate to the surface) that is not clearly defined for more complex surfaces and takes on nonphysical values in the case of alkaline coatings.In recent work, Hashimoto et al. [68] attempted to screen for low and high work function materials using a Bayesian optimization approach.However, they assume the work function to be approximated solely as a bulk property neglecting any surface contributions during screening.For the highest and lowest "bulk work function" material candidates the actual surface contributions have then been included which rendered most of their top candidate materials to exhibit average work functions between 3 and 6 eV.Unsurprisingly, among their top candidate materials, they have found that the (110) surface of elemental Cesium has a low work function of 2.0 eV and that the (111) surface of KEuO 2 has a relatively high work function of 8.4 eV.The approximated bulk work function of some of the screened work function candidates differs as much as 7 eV from the actual work function when including the surface contributions.This clearly shows that, while for simple structures (such as elemental metals) the work function can theoretically be predicted from bulk properties alone, [69] it is important to consider surface contributions to quantitatively predict the work function of a material.The surface termination, atom adsorption (most commonly oxygen and hydrogen), contamination, and reconstructions can affect the surface dipole and hence the effective work function.While a crystal graph convolutional neural network has been used successfully to predict the cleavage energies of intermetallic slabs, [70] there has been no reports on featurizing slabs to predict the work function (except for the MXene 2D-material class [71]).In this paper, we use high-throughput density functional theory (DFT) to calculate 58,332 surface work functions and 33,631 cleavage energies based on 3,716 bulk crys-tal structures (up to ternary compounds with a band gap less than 0.1 eV).The created database gives insight into work function trends observed across a large chemical space.Based on the database we develop a machine learning model with a low mean absolute testerror of 0.09 eV which is more than 6 times lower than the baseline (i.e., predicting every surface to have the database-average work function) and about 4 to 5 times better than state-of-the-art benchmarking machine learning models (automatminer and Coulomb matrix).The database and machine learning model enable us to identify several promising, stable material surfaces with extremely low (< 2 eV) and extremely high (> 7 eV) work functions.Further, the work function model established in this work enabled the discovery of new ultra-bright photocathode materials reported in [72,73].

Methods
Materials Selection and High-throughput Workflow.The workflow of the work function database's creation is illustrated in Figure 1.A total of 3,716 crystal structures were queried from the Materials Project (on 2/2/2023) using the REST framework.[56,74] Up to ternary materials with 10 or less atoms in the conventional unit cell were considered that have an energy above hull of less than 20 meV/atom, are metallic (E gap < 0.1 eV), and are tagged as an experimental structure (i.e., there exists at least one ICSD entry that reports the corresponding material as experimentally synthesized).Materials with an element present that is radioactive, a noble gas or from the actinide group were excluded.Further, materials that have experimental tags that indicate high pressure or low temperature conditions, as well as low-dimensional materials were excluded as well.The frequency with which each chemical species appears in the database (bulk compounds) is plotted as a heat-map on the periodic table in Figure S3.From this set of materials, surfaces up to a Miller index of 1 were generated using Pymatgen's surface module.[59,75,76] Each surface Miller index generally has more than one unique surface termination.The Pymatgen surface module has a built-in option to generate slabs with different terminations determined by possible shifts in the c-direction.We have developed an alternate algorithm to ensure that we generate all possible unique terminations based on the local environment of surface atoms, as summarized in the dashed block in Figure 1 and described in more detail in the Supplementary Material.According to n term we generate slabs that contain the appropriate number of atomic layers to expose each unique termination.Further, we minimize the number of slabs required for the DFT calculations by having two distinct terminations on either side of the slab, whenever possible.The initial slab thickness is minimized while still ensuring that after all necessary subtractions the final slab is at least 10 Å thick.Following this procedure, we create 37,163 slabs and 17,998 bulk structures of which 36,962 and 17,993 converged respectively during self-consistent field calculations with a total computational time of around 440,000 core-hours.The converged calculations after duplicate removal (described in the machine learning model training section) returned a total of 58,332 work functions and 33,631 cleavage energies of which computational details are described next.
First Principles Calculations.The total energies of the slab and bulk as well as the work functions are calculated by gradient-corrected DFT using the PBE exchange-correlation functional.[77] Self-consistent, periodic, total energy calculations are performed using Quantum Espresso (v.7.1).[78] Ultrasoft Vanderbilt pseudopotentials [79] are used to describe core electron interactions (f -electrons not included for lanthanides [80]) and the Kohn-Sham one-electron valence states are expanded in a plane wave basis set with a kinetic energy cutoff of 550 eV.The electron density is expanded up to ten times the energy of the plane wave energy cutoff.An extra 30 unoccupied bands are added for holding valence electrons to improve convergence.All slabs generated by the high-throughput procedure described above have a minimum thickness of 10 Å and 15 Å of vacuum between periodic slab repetitions in the c-direction to preclude interactions between periodic images.Brillouin zone sampling is performed under a grid spacing of less than 0.05 Å −1 with finite-temperature Gaussian smearing (σ = 0.1 eV).A dipole correction is applied in the c-direction.The work function, ϕ, is determined by the difference of the electrostatic energy in the vacuum region, E vac , and the Fermi energy, E F : ϕ = E vac − E F .The PBE exchange-correlation functional has previously been shown to give reliable work functions for elemental crystals in agreement with experimental values with a slight systematic underestimation.The errors with respect to experiment have been reported to be below 0.3 eV for elemental crystals [81] and ∼ 0.15 eV for elemental metals.[82] It is also worth noting that work functions measured experimentally can significantly vary based on the technique and specimen (single-vs.poly-crystalline) used.[83] The DFT calculation inputs for Quantum Espresso are automatically generated with the atomic simulation environment (ASE) [84] Python package and submitted into a high performance computing queuing system (SLURM) using job arrays.
To estimate the convergence accuracy of the DFTcalculated work functions we reran ∼ 1% of the database (randomly selected) with stricter convergence parameters (energy cutoff of 850 eV and Brillouin zone sampling with a grid spacing of ≤ 0.02 Å −1 ).The resulting mean absolute error and root-mean square error of the work function are 22 and 30 meV, respectively.The cleavage energies have been calculated via total energy calculations of the slab, E slab , and the respective bulk, E bulk , with the same unit cell orientation as the slab.This reorientation ensures more efficient convergence with slab thickness due to k-point matching of the two calculations.[85] The cleavage energy is defined as where n bulk is the number of bulk unit cells contained in the slab and A slab is the surface area of the slab.The surface energies are denoted as γ top and γ bottom for the top and bottom surface of the slab, respectively.Using this definition, the cleavage energy is equal to the surface energy, E cleavage = γ, for the case of symmetric slabs (i.e., there exists a mirror or glide plane parallel to the surface, or a 2-fold rotation axis normal to the surface).For non-symmetric slabs, the upper bound for both surface energies is 2 • E cleavage > γ top,bottom .This workflow has also been used to calculate adhesive properties in electrolyte interfaces for solid-state battery applications.[86] For the slabs that exhibit extreme work functions, ionic relaxation calculations of the topmost surface atoms (atoms within 3Å of the surface were relaxed, the remaining atoms fixed) of the 1 × 1 slabs were converged to a force less than 0.05 eV/Å.Only the surface of interest was ionically relaxed while the opposite side remained fixed.
Machine (and a difference in work function between the two surfaces of < 0.1 eV) were removed as duplicates from the dataset before training.For benchmarking purposes we used the automatminer testing suite [87] (200 features) and a conventional Coulomb matrix (topmost 5 surface atoms as input, matrix sorted by ℓ 2 -norm of columnsthe flattened matrix yields 25 features that are used in a random forest model).[88] For automatminer we use the "express" setting and for comparison we used the bulk unit cell and the topmost 5 atomic layers of the surface slabs as inputs.As a baseline model we predict the work function to be the average work function regardless of the surface.

Results and Discussion
Analysis of Surface Database.First, we analyze the surface property database created by high-throughput DFT (33,631 slabs based on 3,716 bulk crystals) in terms of its distribution and trends in the studied chemical space.A contour plot of the database distribution of DFTcalculated work functions vs. surface energies is shown in Figure 2a for ionically unrelaxed slabs.The calculated cleavage energy is exactly equal to the surface energy in  S10).This might also be the case for the low work function tail but appears to be less pronounced.This artifact can be mitigated by ionically relaxing the surface slabs (see next section) and we expect this to result in an overall slightly narrower distribution.Interestingly, the work function distributions of binary and ternary compounds (and to a certain extent also the elemental crystals) have similar averages, standard deviations, and ranges.This may be explained by the observation that the work function is primarily determined by the chemical species present in the topmost layer at the surface (as discussed in the next paragraph), and will largely not depend on the total number of chemical species present in the entire unit cell.Moreover, the average work function of the database is lower than the average work function for the JARVIS database and C2DB (4.91 and 5.43 eV, respectively, cf.Table S1) while the standard deviations are somewhat similar (1.22 and 1.08 eV, respectively).The average cleavage energy of all asymmetric slabs (103.4 meV/Å 2 ) is higher than the average for all symmetric slabs (88.0 meV/Å 2 ).This is expected be-cause this database is calculated for unrelaxed slabs and cleaving asymmetric slabs may lead to dangling atoms in nonphysical positions too far/close to the other surface atoms.
Trends in the work function based on which chemical species are present at the surface are shown in Figure 3.The fraction of surfaces with a low work function (< 2.5 eV, i.e., roughly 1.5 times the standard deviation below average) is especially high for surfaces with alkali or alkaline metals present in the topmost atomic layer.Conversely, the fraction of surfaces with a high work function (> 5.5 eV, i.e., roughly 1.5 times the standard deviation above average) is especially high for surfaces with halogens, carbon, nitrogen, sulfur, selenium or oxygen present in the topmost atomic layer (cf. Figure 3a).Surfaces with hydrogen present exhibit more of both low and high work functions than average, likely due to complex chemistries.The total number of surfaces (rather than fractions) are shown in Figure S4.(alkaline) monolayer-coating.Further, we identify the (100)-Cs-Cl surface of CsScCl 3 to exhibit an ultra-low work function of 1.42 eV in addition to an extremely low surface energy of 5.5 meV/Å  S2 and S3, respectively.From the curated lists of discovered surfaces, further analyses can be carried out in future research to assess viability for device applications, such as assessing stability in air/moisture and surface reconstructions.
Machine Learning Model for Work Function Predictions.The large database created by high-throughput DFT calculations forms the basis for a surrogate machine learning model that enables the prediction of the work function at a fraction of the computational cost.As a first step, we assess common models from the materials science machine learning community as a benchmark.For that, we employ the automatminer testing suite, [87] and a conventional Coulomb matrix (trained with a random forest model).[88] For automatminer we use the "express" setting and compare using the bulk unit cell and the topmost 5 atomic layers of the surface slabs as inputs.
As a baseline model we predict the work function to be the average work function regardless of the surface.The automatminer model performs only marginally better than the baseline model when bulk structures are used as an input.When the surface slabs are used as inputs the performance increases and is comparable to the performance of the Coulomb matrix.The mean absolute errors (MAEs) are shown for the training and test sets in Figure 4a (cf. Figure S9 for RMSEs).The baseline MAE is 0.60 eV and the DFT accuracy is indicated in the green-shaded area between 0.022 and 0.1 eV, corresponding to the convergence error (see Methods) and the error between PBE-calculated and experimental work functions, [81] respectively.It is not surprising that the model performance is poor when the bulk structure is used as an input as the database contains multiple surfaces of different work functions for any given bulk structure.While the performance of the benchmarking models improves when the surface slab is used as the input instead, the MAEs are still large and significant overfitting is observed.This is likely due to the fact, that the models cannot distinguish between top and bottom of the input slab (which in general are not symmetric) and the database consists of all unique terminations.In general, if one termination (located at the top surface) is labeled with the calculated work function, the same termination exists in another input structure at the bottom surface (whereas the calculated work function always refers to the top surface).Hence, the shortcomings of the automated benchmarking models does not come from the machine learning models used but rather the implementation of the featurization of surface slabs.We developed a custom featurization of the surface slabs by considering physically motivated features of the topmost three surface layers.Atoms are regarded to belong to the same layer if their c-coordinate lies within a tolerance of 0.4 Å (see effect of tolerance value on model performance in Figure S7).The considered atomic features are electronegativity χ, inverse atomic radius 1/r, first ionization energy E ion , and Mendeleev number n mend .Given that each layer may contain more than one atom-type, we consider the minimum, maximum, and average of each of these atomic features.This gives a total of 36 elemental features for the topmost 3 layers.Additionally, we add structural features: The packing fraction for each layer (number of atoms per unit cell area) A −1 pack and the distances between atomic layers 1 and 2, d 1−2 , and between layers 1 and 3, d 1−3 .Angle-based features are calculated by considering the angles between the surface normal vector and the vectors spanned between atoms in the topmost layer and their respective nearest neighbors.After all angles are calculated we consider the minimum and maximum values as two features, θ min and θ max .Out of this total 43 features the most significant features are selected with (backward) recursive feature elimination (RFE) using a random forest model, as plotted in Figure 4b.The top 8 features largely account for the model performance: and min (1/r 1 ).For the final model we use the best 15 features, which are the 8 features mentioned above and θ min , d 1−2 ⟨1/r 1 ⟩, min (E ion,2 ), ⟨E ion,3 ⟩, max (χ 2 ), and min (n mend,2 ).It is worth noting that the majority of features we selected were physics-motivated or based on correlations observed in literature.The work function has been shown to linearly correlate with electronegativity for elemental crystals and binary compounds, [59,65], and inversely correlate with the atomic radius.Another work has proposed a phenomenological equation for the work function that depends on the atomic radius and the number of atomic sites per unit cell area at the surface.[67] We chose the ionization energy as a feature based on physical intuition that low ionization energies lead to easy electron extraction.Lastly, the Mendeleev number has been shown to be a descriptive feature for many material property predictions.[89,90] Interestingly, the most predictive features (top 8) are features from the topmost layer (including the layer distance d 1−3 ) with the exception of ⟨E ion,2 ⟩.This is in agreement with the the fact that clear trends are observed considering only the topmost surface (cf. Figure 3a).The final 15 features contain only two features that relate to the third atomic layer: ⟨E ion,3 ⟩ and d 1−3 .Also, we note that we tried adding further elemental features without a clear physics-based motivation (e.g., polarizability) as well as further modes (e.g., geometric mean, range, variance) -however, this did not improve the model performance.Using this featurization approach (with 15 features) outperforms all benchmarking models (automatminer, in comparison, uses 200 features) even when a linear re-gression model is chosen, as seen in Figure 4a.This performance improvement is due to the superior implementation of how we featurize our slabs rather than the machine learning model itself.When a non-linear learning model is used (neural network or random forest model) the MAEs are significantly reduced.Our best model using random forests has a test-MAE of 0.09 eV, comparable to the accuracy of work function calculations employing DFT.This test performance is about 4-5 times better than the best benchmarking model and more than 6 times better than the baseline.Figure 5

Conclusions
In summary, we demonstrate a workflow to create a work function and cleavage energy database from highthroughput DFT calculations that enables us to gain insight into chemical trends of the work function and surface stability.The database reveals 34 ultra-low and 56 ultra-high work function surfaces many of which have low surface energies.The low surface energy candidates are expected to be the stable surface termination and orientation of the corresponding stable bulk material and are hence promising candidates for experimental verification.To our knowledge, we identify the lowest and highest computationally predicted work functions in literature (1.25 and 9.06 eV, respectively) for steady-state metallic surfaces without requiring any monolayer coatings.Further, we establish a surrogate machine learning model for rapid work function predictions.Our model has a MAE of 0.09 eV, comparable to the accuracy of DFT while being ∼ 10 5 times faster.Using this approach facilitates the probing of a vast chemical space for novel material surfaces with exceptionally low or high work functions.

1 )Figure 1 |
Figure 1 | Workflow of the creation of the surface property database and surrogate machine learning model.The illustration includes the steps for material selection, high-throughput DFT calculations, surface slab creation, and supervised machine learning predictions.The dashed block details the procedure of determining the unique terminations of all surfaces up to a Miller index of 1.

Figure 3 |
Figure 3 | Work function trends observed in the database.a Fraction of material surfaces that have a work function below 2.5 eV (purple) and above 5.5 eV (orange) is shown depending on which type of chemical species is present at the topmost surface.The dashed lines indicate the average fraction across the entire database regardless of chemical species at the surface.b Heat-map of the average work function plotted as a function of chemical species present at the topmost layer (vertical axis) and second atomic layer (horizontal axis).The color bar displays work functions below and above average as blue and red, respectively.Categories with a population of less than 10 surfaces have been left blank.

Figure 4 |
Figure 4 | Comparison of machine learning model performances.a Mean absolute errors (MAEs) of training and test sets are given for this paper's machine learning models: Linear regression, neural network, and random forest implementing 15 physically motivated features.The benchmarking models (automatminer with bulk unit cells and with surface slab of topmost 5 atomic layers as inputs, and a Coulomb matrix of the topmost 5 surface atoms, ℓ2-sorted, with a random forest model) are shown in comparison.The baseline model (always guessing the average work function) and the DFT accuracy are indicated by a dashed line and green-shaded area, respectively.The upper limit of the DFT accuracy (0.1 eV) is estimated from the accuracy of the work function calculations of elemental crystals.[81]b 5-fold cross-validated root mean square error (RMSE) as a function of number of input features is shown.Features were selected by recursive feature elimination for linear regression and random forest model.The top 15 most predictive features were selected for the final models.
shows the predicted work function for both the training and test sets in comparison to the DFT-calculated values.The kernel-density estimate distributions for both training and test sets are plotted for predicted and actual work functions showing that the predicted distribution is qualitatively faithful to the actual one.Notably, for the neural network and random forest models there is still a gap between training and test MAEs despite thorough hyperparameter tuning.The learning curves in Figure S8 indicate underfitting for the linear model and slight overfitting for the random forest model.The learning curve trend for the random forest model suggests that increasing the dataset size (by a factor of ∼ 10) could further improve the model performance and close the gap between training and cross-validated errors.The prediction of the work function using this model is roughly 10 5 times faster than DFT while having a MAE comparable to the accuracy of DFT.The database and model are available open-source (see data availability section) enabling other researchers to use this model for work function predictions and help experimentalists in their

Figure 5 -
Figure 5 -Predicted work functions vs. DFT-calculated work functions.The kernel-density estimate distributions for both training and test sets are plotted for predicted work functions and the ground-truth at the top and right, respectively.

Figure S2 :
Figure S2: Method section illustrations.Illustration of the determination of the effective number of unique terminations based on symmetry of the slab.The two slabs both have two unique local environment groups (A and B) and correspond to 2 and 4 unique terminations for a symmetric slab in c-direction and a non-symmetric slab, respectively.

Figure S4 :
Figure S4: Total number of surfaces with low and high work functions plotted as a function of chemical species present at the topmost layer.Analogous to Figure 3a of the main text.

Figure S5 :Figure S6 :Figure S7 :Figure S8 :
Figure S5: Work function averages plotted as a function of chemical species present at the topmost layer.Error bars indicate the standard deviation.

Table II -
Kernel density estimate plot illustrating the distribution of the work function vs. the upper bound of the surface energy.Both calculated by DFT for unrelaxed slabs.b Distribution of the work functions plotted as a stacked histogram.Outline corresponds to overall distribution under which the stacked, colored bars signify the number of surfaces based on elemental, binary, and ternary compounds.The average of the distribution is 3.92 eV with a standard deviation of 0.86 eV.Detailed cleavage energy distribution metrics for elemental, binary, and ternary compounds for unrelaxed symmetric and asymmetric slabs.All values in meV/Å 2 .
Sym. Number Average St.Dev.Min.Max.thecase of symmetric slabs that exhibit either a mirror or glide plane parallel to the surface, or a 2-fold rotation axis normal to the surface.For slabs lacking this symmetry, 2•E cleavage is used as an upper bound to the surface energy.tion of cleavage energies for elemental, binary, or ternary compounds are given in TableIIfor both symmetric and asymmetric slabs, and plotted as a stacked bar histogram in FigureS11.The average of the cleavage energy for all symmetric/asymmetric slabs is 88.0/103.4meV/Å 2 with a standard deviation of 46.0/45.8meV/Å 2 , ranging from a minimum to a maximum cleavage energy of 1.0/0.3 to 396.3/397.3meV/Å 2 , respectively.The observation that the distribution in work functions is near-Gaussian could indicate that the chemical space we chose was diverse enough to evenly sample work functions across possible values.The extended tail at the high work function end appears to be an artifact coming from ionically unrelaxed surfaces where a small, electronegative atom (e.g., oxygen or hydrogen) is cleaved at a large, unphysical distance (as discussed in the next section and corroborated by Figure Overall, 48.8% (29.5 %) of surfaces that exhibit a work function below 2.0 (2.5) eV contain either alkali or alkaline metals in the topmost atomic layer.Conversely, 54.1 % (38.0 %) of surfaces that exhibit a work function above 6.0 (5.5) eV contain either carbon, oxygen, or halogens in the topmost atomic layer.The average work function is plotted as a heat-map based on the chemical species present in the topmost two atomic layers.The trends observed in Figure3aare also seen in the average work function trend in Figure3b.However, one can also observe trends based on combinations of chemical species in the topmost and second atomic layers.For example, the work function average is high for surfaces where halogens are present in the first or second layer.In contrast, the work function average is low for surfaces with alkali or alkaline metals present in the first layer and sulfur or selenium present in second layer -however, the work function average is high for the reversed layers (i.e., sulfur/selenium in topmost layer).Surfaces with Extreme Work Functions.Material surfaces found on the tail ends of the work function distribution in Figure2are of interest for energy conversion and electronic device applications.However, the calculated ab-initio work functions are based on surfaces without considering ionic relaxations.Hence, to screen for viable ultra-low and ultra-high work functions we carry out ionic relaxation calculations for unrelaxed slabs that have a cleavage energy less than 200 meV/Å 2 and either surface with a work function that is lower than 2 eV or higher than 7 eV.From 340 slabs on the extreme ends of the distribution, 284 slabs converged during ionic relaxation to a force less than 0.05 eV/Å.Atoms within the top 3Å of the topmost atom were allowed to relax while the coordinates of the remaining atoms in the slab were fixed.The cleavage energy is generally reduced through ionic relaxation (cf.Fig.S12).As expected, the work functions for materials on the low-end shift towards larger values (mean signed deviation = +0.20 eV).The work functions for materials on the high-end shift towards lower values (mean signed deviation = −0.62 eV).Hence, this effect is much more pronounced for the high work function tailend, as can be observed in FigureS10.
tuition.This shows that for complex chemistries (e.g., hydrogen-containing compounds) basic chemical intuition is insufficient to predict the work function.3 at 1.25 eV with a cleavage energy of 67.2 meV/Å 2 .To our knowledge, this is the lowest steady-state work function computationally predicted in literature for a metallic surface without requiring any

Table III |
Surfaces with ultra-low and ultra-high work functions sorted by spacegroup (sg).Surfaces stemming from structurally equivalent materials (but occupied with different chemical species) are grouped together into families described by chemical formulas containing placeholder letters A and B. For each family the compound that resulted in the lowest/highest work function is listed alongside its work function value and cleavage energy.The Miller indices and the atom termination of the surfaces yielding extreme work functions are specified.Work functions are given in eV and cleavage energies in meV/Å 2 .The cleavage energy is equal to the surface energy for slabs which state y(es) in the right-most column (i.e., the bottom and top surfaces of these slabs are equivalent while the slab maintains stoichiometry), otherwise 2 • E cleavage is an upper bound for the surface energy.

Table S1 :
Detailed work function distribution metrics for 2D materials in JARVIS-DFT database and C2DB.All values in eV.Unique material IDs are listed for materials with work functions below 2 and above 8 eV.

Table S2 :
Surfaces with ultra-low work functions from previous version of the database.Similar materials are grouped together and the composition with the lowest work function is listed for each group.Work functions and bandgaps (PBE, from Materials Project) are in eV.

Table S3 :
Surfaces with ultra-high work functions from previous version of the database.Similar materials are grouped together and the composition with the lowest work function is listed for each group.Work functions and bandgaps (PBE, from Materials Project) are in eV.