Interpretation and Automatic Generation of Fermi‐Orbital Descriptors

We present an interpretation of Fermi‐orbital descriptors (FODs) and argue that these descriptors carry chemical bonding information. We show that a bond order derived from these FODs agrees well with reference values, and highlight that optimized FOD positions used within the Fermi‐Löwdin orbital self‐interaction correction (FLO‐SIC) method correspond to expectations from Linnett's double‐quartet theory, which is an extension of Lewis theory. This observation is independent of the underlying exchange‐correlation functional, which is shown using the local spin density approximation, the Perdew–Burke–Ernzerhof generalized gradient approximation (GGA), and the strongly constrained and appropriately normed meta‐GGA. To make FOD positions generally accessible, we propose and discuss four independent methods for the generation of Fermi‐orbital descriptors, their implementation as well as their advantages and drawbacks. In particular, we introduce a re‐implementation of the electron force field, an approach based on the centers of mass of orbital densities, a Monte Carlo‐based algorithm, and a method based on Lewis‐like bonding information. All results are summarized with respect to future developments of FLO‐SIC and related methods. © 2019 The Authors. Journal of Computational Chemistry published by Wiley Periodicals, Inc.


Introduction
Density functional theory (DFT) [1,2] is one of the most commonly used electronic structure methods. Its popularity comes from its efficiency to calculate properties with reasonable accuracy for a range of systems including atoms, molecules, [3][4][5][6] clusters, [7] and condensed matter. [8][9][10][11][12][13][14][15] While DFT is in principle exact, the exchange-correlation functional E xc needs to be approximated to solve the Kohn-Sham (KS) equations in practical calculations. These density functional approximations (DFAs) often work well, but are known to produce incorrect results for rather simple cases like the dissociation of the H 2 molecule [16] or band gaps in semiconductors that are far from experimental values. [17] The exchange hole [18] (also referred to as the Fermi hole) and the correlation hole (also referred to as the Coulomb hole) are important properties in electronic structure theories, which are often used to discuss exchange-correlation effects. [19][20][21][22][23][24] The exchange hole [25] determines the decrease in the probability of finding an electron at point r, given an electron of the same spin at r 0 . The exchange hole is negative definite everywhere and integrates to −1. In contrast, the correlation hole integrates to 0. The correlation hole corresponds to the decrease in the probability of finding an electron at point r, given an electron at r 0 , due to their charge repulsion. These are constraints an exact DFA should obey.
These holes have been extensively discussed within wave function theories, for example, Hartree-Fock (HF [26,27] ) theory. [25,[28][29][30] There, a separation of the exchange and correlation parts of the energy is possible so that the Fermi and Coulomb hole can be unambiguously defined. Please note that exchange-correlation functionals treat exchange and correlation effects together. Thus, one obtains a density including both exchange and correlation effects and an exchange-correlation hole where the separation into exchange and correlation holes is not straightforward.
To a large extent, the mentioned failures of DFAs can be linked to the so-called self-interaction error (SIE), describing the interaction of an electron with itself. This error occurs due to an incomplete cancellation of the Coulomb self-repulsion by all local or semi-local E xc like the local (spin) density approximation (LSDA), generalized gradient approximations (GGAs) or meta-GGAs. To remove this error, Perdew and Zunger (PZ) [31] proposed a self-interaction correction (SIC) scheme, Here, E xc n σ i ,0 Â Ã is the underlying exchange-correlation functional evaluated for the one-electron density n σ i and E H n σ i Â Ã is the corresponding Coulomb term. Considering the one-electron density limit, SIC calculations reproduce HF results, which are exact for this limit. [31,32] For many-electron systems, the application of the PZ scheme in any self-consistent form offered significant improvements with respect to the underlying density functional, for example, for ionization potentials [33][34][35][36] and (hyper)-polarizabilities. [36][37][38] It has been shown that localized orbitals deliver lower selfinteraction corrected total energies and, accordingly, play an essential role in a proper self-interaction correction in DFT. [31,39,40] As discussed several decades ago, the exchange hole for single-determinant cases can be expressed by means of socalled Fermi orbitals. [29] Luken [30] showed that there are regions in space where the Fermi hole resembles localized orbitals. In these regions, the Fermi orbital F σ j x;a σ j only weakly depends on the position of the reference electron a σ j . The corresponding Fermi hole can be approximated by the square of the localized orbital. In a different context, this was discussed by Baerends et al., [19,41] who used Fermi orbitals to approximate the exchange-correlation hole.
In the notation of Pederson et al., [42][43][44][45] reference electron positions are called Fermi-orbital descriptors (FODs) a σ j , leading to the following definition of an entire set of Fermi orbitals Here, ψ σ i represents a KS orbital and T σ ji denotes the transformation matrix. The number of FODs is equal to the number of electrons in the system. In FLO-SIC, the normalized but nonorthogonal Fermi orbitals are orthogonalized by Löwdin's method. [48] The optimal set of FODs is obtained by moving the FOD positions in order to minimize the SIC energy [42,43] in a variational way. In FLO-SIC, properties such as SIC total energies, eigenvalues, and the density are derived from the optimal Fermi-Löwdin orbitals.
In contrast to KS orbitals, Fermi orbitals are well-localized, comparable to Foster-Boys, [49] Edmiston-Ruedenberg, [50] or Pipek-Mezey [51] localized orbitals. Like other localized orbitals, it is easy to interpret Fermi-Löwdin orbitals as core, lone pair, or bonding orbitals. [29] As the Fermi orbitals are determined by the FODs for a given density, we argue that the optimized FODs carry bonding information.
First, we introduce a bond order based on the FODs. This concept is applied to the G2-1 benchmark set and the calculated bond orders are compared to reference values. Second, to make FOD positions generally accessible, we describe computational approaches to find reliable initial values for the FODs that can then be optimized. The performance of the generated descriptors is evaluated within FLO-SIC by comparing the starting values to fully optimized FODs. We optimized FODs using LSDA, [52] the Perdew-Burke-Ernzerhof (PBE) [53] GGA, and the strongly constrained and appropriately normed (SCAN) [54] meta-GGA. All optimizations with these different functionals deliver FOD bond orders which are in agreement with the well-known Lewis theory [55] or, more generally, with Linnett [56][57][58] theory.
While most of the employed basic ideas are known from mathematics or physics and used in other fields, we apply and extend these ideas to generate FODs. Four different approaches are discussed in detail-an approach using an electron force field, a quantum mechanical approach based on the centroids of localized one-body densities, a Monte Carlo-based algorithm, and an approach using information from Lewis [55] structural formulae. The advantages and drawbacks of each method will be discussed with respect to automation, speed, scope, extensibility, and insensitivity. These criteria will be introduced in detail later.
A short introduction to the concept of bond orders and to the nearly forgotten Linnett double-quartet (LDQ) theory [59] is given in the next section.

Relation Between FODs and Chemical Bonding Information
The adiabatic connection formula [60] relates the exchangecorrelation energy to a coupling-constant integrated exchangecorrelation hole. The exchange-correlation hole visualizes electron correlation in an inhomogeneous material by measuring the change in density from its mean value at each point in a system with respect to a particular reference point. The exchangecorrelation energy per particle can be interpreted as the interaction energy of the particle with its exchange-correlation hole. The exchange hole refers to a noninteracting ground state and contains only correlation due to the Pauli principle. It is often the dominant part of the total exchange-correlation hole. [61] In the case of covalent bonds, the exchange hole is not very dependent on the reference position and can be approximated by localized orbitals near the bond center. [61] This connection between chemical bonding and the exchange-correlation hole has also been discussed by Giesbertz et al. [62] The FODs are used to construct the Fermi orbitals, whose squares can be used to approximate the exchangecorrelation hole. [19,29,41] This indicates that the optimized positions of the FODs in FLO-SIC may carry bonding information. In this section, we set out to investigate this in more detail.
In the chemistry community, it is common practice to use the concept of bond order. [63,64] Localized orbitals (e.g., NBO [65] ) and other concepts [66,67] can be used to determine bond orders in a post-processing fashion. In FLO-SIC, one can define a simple FOD bond order (BO FOD ) as where m is the number of electrons involved in the bond and n is the number of atoms that belong to this bond. In this context, we attribute one FOD to one electron. The BO FOD is based on counting the FODs between bonded nuclear centers. A detailed example of the FOD bond order for N 2 is given in Figure 1. A BO FOD of 1 represents a single bond (e.g., C─C), 2 represents a double bond (e.g., C C) and 3 represents a triple bond (C≡C). All of the mentioned bond orders are represented well by Lewis theory, the conventional framework for describing bonding in chemistry that is based on the octet rule and popularly visualized in Lewis dot diagrams. However, there are other bond orders that are not properly described by Lewis theory. For example, 2.5 represents a two center-five electron bond and 1.5 represents a two center-three electron bond. In addition, we distinguish between the commonly known double bond (e.g., C C) consisting of two electrons in each spin channel and the uncommon double bond consisting of three electrons of one spin channel and one electron of the other spin channel. We have chosen a shorthand notation in which twoelectron bonds are represented by solid lines, whereas oneelectron bonds are represented by dotted lines. A systematic analysis of recently published FLO-SIC results for the G2-1 benchmark set [36] allows us to determine the FOD bond order for all molecules of this set. The results are summarized in the Supporting Information. FLO-SIC provides essentially the same bond order as is provided by Lewis structures for systems with straightforward bonding (51 molecules in total). For four systems (see Table 1), literature references give conflicting bond order values.
All of these systems can be described as radicals, that is, with a structure containing at least one unpaired electron. In such cases, the Lewis structure is not clear, and different sources (e.g., CCCBDB, [68] PubChem, [69] ChemSpider [70] ) give different structures and hence different bond orders. In view of this, it is unclear which reference value should be taken for comparison (see Supporting Information for a comparison between different literature values). There might be a need for more consistent bond order reference values, but this is beyond the scope of this work.
While FLO-SIC predicts, in agreement with Lewis theory, a bond order of two for the O 2 molecule (see Table 1), it is obvious that the FLO-SIC electronic geometry is not what would be expected from the conventional Lewis structure. FLO-SIC predicts a double bond consisting of three electrons of one spin channel and one electron of the other spin channel. We use this example to recall a nearly forgotten theory named LDQ theory. [56][57][58][59] Whereas Lewis never included the spin in his famous model, it plays a critical role in Linnett theory. For Linnett, the octet consists of two quartets, each only containing electrons of the same spin, giving the theory its name. LDQ does not contradict Lewis theory, but rather delivers additional insights where Lewis theory fails.
According to Linnett, the spatial distribution of electrons is determined by two different correlation effects. [57,58,71] One of them is the Pauli exclusion principle, which leads to the tendency of electrons having parallel spins to keep apart, and favors the spatial proximity of electrons with antiparallel spins. [57,58,71] The second correlation effect is that all electrons carry the same negative charge and therefore aim to keep apart from each other because of Coulomb repulsion. Consequently, four electrons of the same spin experience negative charge correlation on top of negative spin correlation. These electrons try to arrange themselves in a way that maximizes their distance from each other. [57,58,71] Geometrically, this is achieved by occupying the corners of a regular tetrahedron (4 sp 3 orbitals). [57,58,71] The overall correlation between the two spin quartets is assumed to be small, as the positive spin correlation partly cancels the negative charge correlation. Therefore, the quartets and their tetrahedra keep a certain degree of independence from each other concerning their orientations around the nuclei. [57,58,71] Linnett implicitly argues that a relatively small deviation from the strictly regular tetrahedra might be energetically favorable in certain cases. [57,58] In general, LDQ structures are constructed by arranging the electron tetrahedra in a way that minimizes inter-electronic repulsion by spin and charge, while still maintaining the octet rule. [57,58,71] Linnett structures can be written similarly to their Lewis counterparts. In contrast to Lewis, however, each simplified Linnett structure assigns every single electron to either a dot or a cross, depending on its spin. [57,58,71] Linnett and Lewis structures are expected to be the same in straightforward cases like CH 4 , [59] where the bonding arrangement dictates that the up and down spin quartets should be arranged in the same way. Various details and additional aspects are discussed in the bachelor thesis of Kraus. [64] Figure 1. FOD bond order BO FOD . The N 2 molecule is used as an example consisting of the nuclear centers (A, B) with their respective FODs (shown as small spheres). Region a describes where lone pair FODs are located. These FODs are associated with one center, but are not spatially close to the nucleus. Region b describes the core FODs. Core FODs are spatially very close to one center, in contrast to valence FODs. Here, spatially close means that the distance of the core FODs is much smaller than the covalent radius of the corresponding atom. Region c describes the bond region, which contains the valence FODs of connected centers (in this case A and B). Counting all m FODs (up, down) in region c for a specific n center arrangement gives the FOD bond order (see eq. (4)). For the N 2 example, we count two centers (n = 2) as well as three up FODs and three down FODs in region c (m = 6). The resulting FOD bond order is m/n = 6/2 = 3, which is interpreted as a triple bond. The three bond FODs form an equilateral triangle around the bond axis. In the following, we will call attention to analogies between Linnett's quasi-classical electron positions and the arrangements of optimized FOD positions. Our discussion will be qualitative in nature, with a focus on general structural motifs. Linnett proposed a LDQ structure for the O 2 molecule that is in excellent agreement with the FLO-SIC electronic geometry (see Supporting Information). For the CO 2 molecule, Linnett compared six different structures [56,58] (including the Lewis structure). The structure that he predicted to be most stable was not the Lewis structure. In fact, it was a structure which (similarly to Linnett's O 2 structure) does not show the common double bonds (i.e., C O, two up and two down FODs). Instead, it features two uncommon double bonds (i.e., three up FODs and one down FOD or vice versa). For both structures, the bond order is identical and charge neutrality is ensured, but for the Linnett structure, the quasi-electron positions are further apart from one another.
We found that Linnett's proposal is in agreement with FLO-SIC results (see Fig. 2). The Linnett-like FOD arrangement for CO 2 is energetically preferred by about 2mE h (1 E h = 27.2112 eV) over the Lewis structure within FLO-SIC LDA (see Supporting Information as well). This is not an isolated occurrence. We found that bonding situations as described by the electronic geometry in FLO-SIC generally agree with LDQ theory, especially in cases where the predictions differ from Lewis theory.

Fermi-Orbital Descriptor Generators
As shown above, FODs carry bonding information, and they are needed to carry out FLO-SIC calculations, thus it is important to have suitable methods to generate them. Fully optimizing the FODs is numerically demanding, as the evaluation of FOD forces, which are defined as the derivatives of the total energy with respect to FOD positions, is time-consuming. The numerical effort increases with the number of electrons, that is, the number of FODs. Therefore, it is desirable to have initial FODs that are close to the optimal positions to speed up FLO-SIC calculations.  Table 1. FLO-SIC bonding and FOD bond order for non-trivial cases in the G2-1 set using MP2 geometries compared with reference values. [68] The notation nÁc-mÁe refers to n-center-m-electron. Furthermore, we have chosen a shorthand notation for the formulae in which two-electron bonds and lone electron pairs are represented by solid lines. Moreover, one-electron bonds are represented by dotted lines and nonbonding unpaired electrons are represented by dots. Note: The complete FOD bond order analysis for the G2-1 set can be found in the Supporting Information. In the following, four different approaches to achieve this goal will be presented. The first one is based on a classical force field formulation (PyEFF), the second one uses the centers of mass of single-electron densities (PyCOM), the third one is a Monte-Carlo method minimizing the Coulomb repulsion (fodMC), and the final one works with fixed structural motifs derived from Lewis structures (PyLEWIS). To judge the quality of these methods, FODs generated using them are compared to fully optimized FODs for two benchmark test sets.
Finally, we analyze the performance of the four methods with respect to required user input (automation), range of systems the method can be applied to (scope), computational time (speed), additional utility beyond FOD generation (extensibility), and how much the generated FODs depend on the input (insensitivity).

Python-based electron force field
One possibility for a fast automatic generation of FODs would be to use force fields, treating the electrons as classical particles. Optimizing the positions of these classical particles is then analogous to a molecular geometry optimization. It could be interesting to try using those optimal positions as FOD starting guesses. Such force fields exist, for example in the form of the LEWIS [72,73] or the LEWIS • force fields. [74] Unfortunately, these force fields are limited to only a few chemical elements, and there is no free implementation available.
Another option is the so-called electron force field (eFF). [75][76][77][78] The eFF computes the total energy and the forces on the electrons and nuclei of the system. Here, the nuclei are represented as point charges and the electrons as spherical Gaussians [79] ψ that is, as a function of nuclear r and electron x i positions and electron sizes s i . The electron positions and sizes are varied as parametric functions of the nuclear positions to minimize the total energy of the system given by This variation is done using a gradient-based algorithm. The terms in eq. (6) describe the sum of kinetic energy E ke , the Hartree electrostatic energies E nuc, nuc , E nuc, elec , and E elec, elec as well as an antisymmetrization (Pauli) correction E Pauli . Further details are given in the Supporting Information.
In the eFF, each electron carries a spin (+1/2 or −1/2). For instance, one can calculate a C atom with S = 1 (four up, two down electrons) and S = 2 (five up, one down electrons) in the same way one would handle it within FLO-SIC.
The first formulation of the electron force field [75][76][77] (eFF1) may be called a quasi all-electron approximation, because all electrons (core + valence) are treated explicitly. The simple Gaussians used to represent electrons resemble s-orbitals. Therefore, it is not surprising that the force field becomes less accurate for atoms containing states with higher angular quantum numbers. In the case of eFF1, only elements with Z = 1-6 are described well. The eFF2 [77] formulation tries to overcome this limitation, and has been tested for systems containing elements with Z = 1-10.
The standard eFF1 formulation is available in the eff-code [75] and in LAMMPS, [80] but the eFF2 formulation is not available yet. The eFF1 formulation is already available in our PyEFF code. As one can see in the Supporting Information, our implementation of eFF1 within PyEFF is in excellent agreement with the reference implementation. PyEFF is written in python and uses standard scientific python packages (e.g., numpy, scipy, sympy, ase). It offers several input and output features to generate FOD input files for the NRLMOL code [81][82][83][84][85][86][87] or our PyFLOSIC implementation. [88] A typical input for PyEFF is the nuclear geometry, the spin state, and starting values for the electron sizes s i . Thus, we do not directly use bonding information of the system to generate FOD starting guesses.
Besides the possibility of calculating different spin states, this force field formulation delivers a total energy expression and can therefore be used to describe bond dissociation. Furthermore, PyEFF can be used to calculate ionization potentials and electron affinities. We calculated the H 2 dissociation curve with PyEFF and compared the results to FLO-SIC LDA (see Supporting Information). It is known that DFT is not able to reproduce the correct dissociation limit in contrast to FLO-SIC. The PyEFF bond dissociation curve shows a qualitatively correct description of bond breaking in the stretched-bond region compared to FLO-SIC LDA. However, the energy in the current formulation of the eFF is not sufficient to reach FLO-SIC LDA accuracy. It is interesting to consider how physical the results are, in view of the fact that the optimal classical electron positions suggested by PyEFF are used as guesses for FOD positions, which themselves can be loosely thought of as classical electronic positions.

Python-based center of mass
The main assumption for the python-based center of mass (PyCOM) method is that the center of mass (or center of gravity or centroid) of one-particle densities can give reasonable starting values for the FODs. The first step is a standard DFT calculation for a given nuclear geometry, yielding a set of KS orbitals. This requires choosing the exchange-correlation functional, basis set and numerical grid. In PyCOM, the user can choose between using Foster-Boys, Edmiston-Ruedenberg, or Pipek-Mezey localization procedures to transform the KS orbitals (each spin channel separately). The resulting initial FOD guess depends on the choice of the localization method. Lehtola and Jonsson [89] provide a detailed overview of the different procedures. For our implementation, we used the PySCF [90] code. The localization procedure in PySCF uses a fast and efficient algorithm based on a co-iterative augmented Hessian method. [91] After the localization, the orbital densities are calculated on a numerical grid. Finally, the center of mass is calculated and collected for each orbital density. The calculated positions are saved as an xyz file including both nuclei and descriptor positions. The resulting descriptors depend on the exchange-correlation functional, the basis set, the numerical grid, and the localization method. This method scales like a standard density functional theory calculation, adding the cost of applying a localization method to each spin channel.
This procedure is thought of as a stand-alone post-processing procedure. Having access to orbitals in standard cube-files enables the possibility to use orbitals from any other code. For future use in PyFLOSIC, the method can be sped-up by calculating the trace of the density times the first moment matrix directly using PySCF routines, instead of writing the orbitals on a numerical mesh.
Of the FOD generation methods discussed here, PyCOM is the one with the highest numerical effort (a comparison of timings between PyCOM and fodMC is given in the Supporting Information), but it is applicable to any system that can be studied within density functional theory. The only system-dependent parameters are the atomic coordinates and the spin state.

Fermi-orbital descriptor Monte Carlo
This method is based on the fact that FODs typically arrange in structural motifs for atoms and molecules, organized according to atomic subshells (e.g., FODs corresponding to 2 s and 2p

FULL PAPER
WWW.C-CHEM.ORG orbitals arrange in tetrahedra) and bonding configurations (e.g., triple bonds are described by three FODs of each spin arranged in a triangle around the bond axis, see Fig. 1). For atoms, the basic idea is to obtain these structural motifs by a homogeneous distribution of N points on spheres. To achieve this distribution, we minimized the Coulomb repulsion between all points assuming the points to be point charges. This is analogous to the well-known mathematical Thomson problem. [92] The quantity to minimize is where each distance is evaluated as with x,y,z being the coordinates of points i and j. To obtain f r , a Monte-Carlo (MC) approach (eqs. (9)-(13)) has been employed. The basic concept will be discussed for the simplest case of a single sphere.
Spherical coordinates (r, θ, ϕ) are used, and all N positions are initialized at (r, 0, 0). This was done to avoid an initialization near a local mininum. After the initialization, two random numbers α and β are generated in the interval of [0,1] for all N points. These serve in the construction of a new position of point i (i 2 N) at step n using a step size a step = 0.001.
x n i = rÁsin θ n i À Á Ácos ϕ n i À Á ð11Þ After all positions have been moved, the Coulomb repulsion is evaluated according to eqs. (7) and (8). If f n r < f n−1 r , the new positions are taken into the next MC step. If not, the old positions are used again. As a note, f 1 r is set to 10 6 . Proof-of-concept calculations for a single sphere were performed, distributing up to 1000 points. The results can be found in the Supporting Information. One can imagine such a sphere as a combined orbit for a given set of electrons (e.g., a shell in an atom). With that, this basic MC procedure is extended for the determination of initial FODs. While FODs for atoms are initialized as outlined above, finding FOD positions in molecules is done differently. To be specific, core FODs are optimized using the mentioned MC procedure. A modified ansatz is employed for bond and lone pair FODs, as explained below. The entire approach has been implemented in a FORTRAN routine, called fodMC. Figure 3 shows a simplified outline of the fodMC.
There is a differentiation between spin up (up) and spin down (down) FODs. This allows the treatment of spin-polarized systems and gives a more reasonable picture of the FODs in space. For atoms, the up and down FODs are rotated against each other (after being individually optimized) to formally decrease their Coulomb repulsion (rotations are again based on MC). This leads to inverted tetrahedra of up and down FODs in, for example, Ne (see Supporting Information).
To obtain a good set of starting FODs for molecules, recognizing the presence of bonds and lone pairs is essential. To form bonds, the center between atoms is taken as a starting point. The bond FODs are placed either at this center (single bond) or are initialized perpendicular to the bond axis (multiple bonds). For the latter, MC steps are carried out to equidistantly distribute the bond FODs perpendicular to the bond axis (via rotations). This is analogous to the distribution of the Fermi hole described by Luken. [30] If the numbers of up and down FODs in the bond are equal, they will be paired. By specifying the number of up and down FODs to be different, one can form, for example, two centerfive electron bonds. Furthermore, previous FLO-SIC LDA calculations have shown that FODs always lie closer to the H atom in X─H bonds. Accordingly, the FODs in such bonds are not placed at the midpoint between the atoms, but at 85% of the bond distance toward the H.
For unpaired (e.g., in radicals) and lone pair FODs, a similar ansatz is used. First, all vectors describing bonds to a given atom are summed up to obtain a resulting bond vector. For example, in Figure 1, the resulting bond vector for the first nitrogen atom points directly toward the second nitrogen atom. If there are multiple bonds like in H 2 O, the resulting bond vector for oxygen points in between the two H atoms. Lone FODs have to reside in the opposite direction of this vector (see Supporting Information as well). For multiple lone FODs, the same procedure as for multiple bond FODs is used. This leads to a simple and intuitive formation of bonds and lone pairs. For planar or linear molecules (other than diatomics) where atoms have lone FODs, such FODs are oriented perpendicular to the molecular plane or the molecular axis, respectively. Overall, this approach yields FOD geometries in correspondence to LDQ theory. [57,58] After all valence FODs are initialized, core FODs are introduced by symmetrically placing them on their respective shells. The radii and number of FODs for the subshells for all atoms are stored in a database file. Afterward, they are optimized with respect to the valence FODs, that is, f r is minimized globally via rigid rotations (see Fig. 3).
The fodMC is a computationally efficient method. It needs nuclear positions as well as a FOD bond order as input, which allows calculating initial FODs for any bond order. This is especially important for systems where the bonding situation is ambiguous. For instance, the fodMC allows an explicit generation of Lewis and Linnett FOD guesses, which can then be compared using FLO-SIC. Because not all bond orders that can be initialized are physically or chemically meaningful, this method should be used with care. It is not a black-box method.
As pointed out by Lehtola et al., [93] variational PZ-SIC can suffer from a multiple minima problem. This problem might be addressed, at a higher computational cost, by performing a stability analysis. In FLO-SIC, the electronic geometry built up by the FODs offers a computationally cheap alternative to analyze the resulting SIC solutions. With the fodMC, one can initialize any arrangement of FODs for a given system and judge which configuration corresponds to the most stable SIC solution.

Python-based Lewis
Prior application of the FLO-SIC method to atoms and molecules [42,45,94,95] showed that there are repeating structural motifs for the FODs describing orbitals for core and valence electrons. As discussed previously, in the case of valence electrons, these structural motifs carry bonding information and are transferable between similar chemical bonding situations.
PyLEWIS uses these known structural motifs to generate the FODs. The structural motifs are precalculated using fodMC or PyEFF and are stored as fixed data sets. The placement of the descriptors is then based on the data sets and fixed rules or geometrical constraints the FODs have to obey. There is no further optimization of the FOD positions after placing the FODs according to the fixed data sets and rules. Due to the geometrical constraints, the generated guesses may not be as good as the guesses generated with the other methods.
For example, FODs corresponding to 1s core states are located at the respective nuclear position and FODs for 2s2p orbitals are arranged as tetrahedra around the nucleus. FODs describing a single bond are located on the bond axis between the respective nuclei (one up and one down FOD), FODs corresponding to a double bond hover above and below the bond axis (one up and one down FOD for each position), and FODs representing a triple bond are placed on a triangle around the bond axis (one up and one down FOD at each vertex of the triangle). In addition, FODs representing single unpaired electrons are placed at an elongation of the bond axis at the side of Table 5. Errors in the RPDF for the initial FOD positions of the different FOD generators in comparison with the optimized ones. The comparison includes the mean error (ME, first value), mean absolute error (MAE, second value), and the root mean square deviation (RMSD, third value). The smallest errors of each type are highlighted in bold for each system. one nuclear position. The orientation of the different structural motifs with respect to each other is based on fixed rules gained from analyzing the results of the fodMC method. The input provided by the user is the bonding information of the molecule. The idea is to read a Lewis structural formula of the molecule and to produce a set of FODs corresponding to that input. Standard formats like the so-called SMILES string as well as the common protein database (pdb) file can be used. The bonding information is translated into a xyz file containing nuclei and descriptor positions. PyLEWIS is written in python and uses the rdkit [96] python package.

Benchmark Set
To compare the performance of all four methods introduced in the previous section, we choose a small benchmark set consisting of atoms and molecules with single bonds (BH 3 , CH 4 , H 2 O 2 , HF, F 2 , C 2 H 6 ), double bonds (C 3 H 5 , C 2 H 4 ), triple bonds (HCN, N 2 , C 2 H 2 , N 3 H), aromatic rings (C 6 H 6 ), spin-polarization (CH 3 , CN), and three-dimensional character (C 60 ). C 60 is only discussed qualitatively to reduce the computational effort.
To extend our benchmark set to a larger variety of bonded elements, we have analyzed the following molecules: CH 2 F, methanol (CH 3 OH), ethanol (C 2 H 5 OH), formaldehyde (CH 2 O), acetaldehyde (C 2 H 4 O), formamide (CH 3 NO), monofluoromethane (CH 3 F), difluoromethane (CH 2 F 2 ), trifluoromethane (CHF 3 ), tetrafluoromethane (CF 4 ), methanimine (CH 2 NH), methanethiol (CH 3 SH), and ethanethiol (C 2 H 5 SH). For this extended study, we only used PyCOM and fodMC, as they are applicable to any system. On the contrary, initial FOD positions for these structures can be difficult or impossible to obtain with the current versions of PyEFF or PyLEWIS.

Computational details
All FLO-SIC calculations and FOD optimizations were performed with our recently developed PyFLOSIC code. [88] This code offers a user-friendly python implementation of FLO-SIC within the well-maintained PySCF [90] electronic structure code.
We use a fully self-consistent approach with unified Hamiltonians. [39,97] Using the H OOOV unified Hamiltonian, we get identical total energies, densities and polarizabilities in comparison to the Pederson Hamiltonian in FG-NRLMOL. [98] All results presented in this work were obtained by employing the H OOOV . It is worth mentioning that for this study, both unified Hamiltonians (H OO and H OOOV ) deliver nearly identical results (see Supporting Information).
All calculations were done with a density functional optimized (DFO) [85] basis set, an energy threshold of E thres = 10 −6 E h and a PySCF grid level of 7. The forces on Fermi-orbital descriptors were optimized using the atomic simulation environment (ase) [99] geometry optimizer FIRE [100] until the maximum force component drops below a threshold of F OPT max < 5Á10 −4 E h =a 0 . Experimental geometries were taken from the CCCBDB database. [68] Each FOD optimization was started from fodMC guesses and performed for different rungs of density functionals, namely LDA, [52] PBE, [53] and SCAN. [54] This is done to illustrate that the same set of initial FODs is optimized differently using different exchangecorrelation functionals. The results are summarized in Table 2 for  Table 3 for FLO-SIC PBE, and in Table 4 for FLO-SIC SCAN. All fully optimized FOD geometries recover the structural motifs of LDQ theory. [56][57][58][59]64,71,101,102] The energy change during the FLO-SIC optimization is less than 25 mE h , thus the fodMC delivers excellent starting values for a FLO-SIC calculation.
It can be difficult to analyze the characteristics of the FOD positions for the different FOD generators. The radial pair distribution function (RPDF) g(r) can deliver information about the coordination sphere of specific FODs. The RPDF provides a tool to compare relative FOD positions, making it ideal for the present study. Pair-distribution functions for nuclear positions are often used to analyze molecular dynamics trajectories. [103,104] Technical details about the RPDF in general are given in the Supporting Information.

Results
All FOD generator methods were used to obtain starting FOD positions for the systems in the small test set, while for the extended test set only fodMC and PyCOM were used. These starting positions are compared to fully optimized FOD positions in FLO-SIC LDA (starting from fodMC guesses, see Table 5). Typical errors like mean error, mean absolute error, and the root mean squared deviation are used to quantify the RPDF results. [36] A small error between the RPDFs of a guess compared to the optimized FODs indicates that the guess is already close to the minimum configuration, while large errors indicate a larger deviation between the structures. Errors in the RPDF might be slightly biased due to the reference used, but this should not strongly affect the results.
As an example, we present the RPDF for the BH 3 molecule (see Fig. 4). The RPDFs for PyCOM and PyEFF describe the same trends as do the RPDFs for fodMC and PyLEWIS. The best agreement with the RPDF of the optimized FODs is given by fodMC and PyEFF.
As can be seen from Table 5, for small atoms (Be-C) and simple hydrocarbon molecules (CH 3 , CH 4 , C 2 H 2 , C 2 H 4 , and C 2 H 6 ), the fodMC and PyEFF methods generate better guesses than PyCOM. The intrinsic limitation of PyEFF can easily be observed in the large errors for N 3 H and C 3 H 5 . The errors for PyLEWIS guesses are not large, but the method needs further improvements before it can achieve an accuracy similar to that of PyCOM or fodMC. Considering the small benchmark set, the fodMC outperforms all other methods. Unlike PyEFF and PyLEWIS, fodMC and PyCOM are able to deliver guesses for any systems. Thus, we recommend using fodMC and PyCOM for systems with unknown or more complex bonding situations, such as the extended benchmark set in Table 6. The additional RPDF analysis for this benchmark set confirms that fodMC performs Table 6. Errors in the RPDF for the initial FOD positions of PyCOM and fodMC in comparison with the optimized ones for the extended benchmark set. The comparison includes the mean error (ME, first value), mean absolute error (MAE, second value), and the root mean square deviation (RMSD, third value).

System
PyCOM fodMC  better than PyCOM, but both methods deliver good starting points even for complex structures.

Efficiency analysis
All FOD generator methods were compared as stand-alone tools. In the following, these tools are compared with respect to specific properties (Automation, Speed, Extensibility, Insensitivity, and Scope).
Automation. The input level for the PyCOM method is the lowest of all methods. The performance of PyEFF critically depends on the effective electron radii. In case of fodMC and PyLEWIS, the complete bonding matrix is needed as input.
Scope. Both PyCOM as well as fodMC can treat any system, whereas PyEFF (until now) only works properly for systems containing atoms from hydrogen to nitrogen. PyLEWIS only works for systems up to Ar. Thus, both PyCOM and fodMC have an unlimited application scope.
Speed. PyCOM needs a standard density functional theory calculation as input and it requires the numerical effort of a localization procedure, making it the computationally most expensive method. The second highest numerical effort is required by the PyEFF method, because force field calculations need to be performed. The numerically cheapest methods are fodMC and PyLEWIS. Both methods have practically no numerical cost.
Extensibility. Besides the starting guess, PyEFF delivers an energetical description and accordingly has utility beyond FOD generation. PyLEWIS can start from various starting information (e.g., SMILES string or pdb files), and the method can be optimized using additional constraints. Knowing that FODs are not explicitly the centroids of localized orbitals, one can optimize PyCOM by using, for example, a least square fit between pre-localized orbitals (like Foster-Boys) and the target Fermi orbitals by varying the FOD positions until the difference is minimized. This should lead to improved initial FODs. The fodMC can only be improved with respect to its FOD generation. For example, there might be better starting values to initialize the bond/lone/core FODs. Furthermore, one could introduce a Metropolis algorithm for the movement of the FODs, allowing f r to go uphill.
Insensitivity. The fodMC delivers one set of FOD positions for a molecular geometry and a given bonding matrix. There is no further information needed from the user. If the user changes the bonding matrix, the resulting set of FODs will change too. Similarly, changing the effective electron radii in PyEFF will result in different FODs. In the case of PyLEWIS, the user has to provide information on the molecular structure, chemical bonding, and eventually additional constraints to be applied in the FOD placement. Providing different bonding information will give different sets of FODs. The method with the worst score in terms of insensitivity is PyCOM, because it requires the molecular structure, all details of the DFT method and the localization method used in generating single particle densities. The resulting set of FODs will depend on the exchange-correlation functional as well as on the numerical details. A visual efficiency analysis is given in Figure 5. This figure shows the relative performance of each method considering several aspects. All facts presented in this paragraph are related to essential properties (Automation, Scope, Speed, Extensibility, and Insensitivity) on a relative scale to analyze the strengths and weaknesses of each presented method.

Conclusions
The most significant conclusion from the work presented here is that the FODs in the FLO-SIC method carry bonding information based on their relationship to the exchange hole (and possibly the exchange-correlation hole). A FLO-SIC bond order based on FOD positions is introduced and benchmarked for structures of the G2-1 benchmark set. Notably, it was shown that fully optimized FOD geometries correspond remarkably well to LDQ theory, which is especially relevant for molecules where Lewis theory does not provide a well-defined description of the bonding.
Four different methods (PyEFF, PyCOM, fodMC, and PyLEWIS) to generate initial FOD positions for a FLO-SIC calculation are presented. These methods are compared with respect to the use of bonding information for the generation of initial FODs. We introduce new benchmark sets of molecules to test the utility of the initial FOD positions and give a detailed comparison of the FOD positions, detailing a comparison of the efficiency, the usage, the advantages, and disadvantages of the methods. Finally, we recommend checking any set of optimized FODs of a determined FLO-SIC minimum against LDQ or Lewis structures. Doing so allows the verification of the respective FLO-SIC solution because, as we have suggested here, the FODs should agree with at least one of these structures.

Outlook
As discussed above, optimized FOD positions resemble LDQ structures. This is not only true for atoms and small molecules, but also for aromatic systems like the well-known benzene molecule (see Fig. 6). The corresponding FLO-SIC calculations show that the LDQ structure is energetically preferred compared to the Lewis structure. This is true for the total as well as the SIC energy (see Supporting Information).
The methods explained in this article allow the investigation of large molecules, because initial FOD positions can be obtained easily. We present initial FOD positions for molecules of chemical and biological interest, such as C 60 (see Fig. 7) or vitamins like vitamin C and E (see Fig. 8). In addition, the analysis of the FOD positions opens the way to analyze chemical reactions electron-by-electron. This opens the possibility to study coupling and reaction paths in the future.
Further, we plan to re-parametrize the eFF with FLO-SIC reference data, given the promising results of PyEFF for small atoms and molecules. This may open a way for establishing embedding techniques, where only the active center is calculated with FLO-SIC and the inactive parts are calculated with a FLO-SIC optimized electronic force field. Such techniques would enable the treatment of larger macromolecules or biological systems (e.g., endohedral fullerenes, vitamins, or even DNA).
We discussed above the chemical relevance of the FODs to describe and analyze bonding and proposed several methods to find them. The proposed methods are a first step in understanding FLO-SIC and FODs better. In the future, other methods like machine learning procedures could replace these methods. Finally, all methods might be combined in one user-friendly umbrella FOD generator code as suggested by one referee of this article.
While this article addresses the chemical interpretation and generation of FODs, it also raises several new questions for the community. For example, can the FOD positions be an intrinsic descriptor for different levels of theory (wave function theories, density functional theory, and classical methods)? Can an efficient multilevel theory be formulated based on them? We hope that the contents of this article help to motivate the community to work on these important unanswered questions in order to efficiently treat physical, chemical, and biological problems computationally in the future.