Extended tight‐binding quantum chemistry methods

This review covers a family of atomistic, mostly quantum chemistry (QC) based semiempirical methods for the fast and reasonably accurate description of large molecules in gas and condensed phase. The theory is derived from a density functional (DFT) perturbation expansion of the electron density in fluctuation terms to various orders similar to the original density functional tight binding model. The term “eXtended” in their name (xTB) emphasizes the parameter availability for almost the entire periodic table of elements (Z ≤ 86) and improvements of the underlying theory regarding, for example, the atomic orbital basis set, the level of multipole approximation and the treatment of the important electrostatic and dispersion interactions. A common feature of most members is their consistent parameterization on accurate gas phase theoretical reference data for geometries, vibrational frequencies and noncovalent interactions, which are the primary properties of interest in typical applications to systems composed of up to a few thousand atoms. Further specialized versions were developed for the description of electronic spectra and corresponding response properties. Besides a provided common theoretical background with some important implementation details in the efficient and free xtb program, various benchmarks for structural and thermochemical properties including (transition‐)metal systems are discussed. The review is completed by recent extensions of the model to the force‐field (FF) level as well as its application to solids under periodic boundary conditions. The general applicability together with the excellent cost‐accuracy ratio and the high robustness make the xTB family of methods very attractive for various fields of computer‐aided chemical research.


| INTRODUCTION
Computational modeling at an atomistic scale is now an essential tool in natural science and has become a vital ingredient also in today's research in industry. While the field of quantum chemistry (QC) was for a long time dominated by a few experts who were using complicated software requiring super-computer resources to solve some special chemical problems, the situation has changed tremendously in the last 2-3 decades. Nowadays, routine density functional theory (DFT) calculations [1][2][3] can be conducted by experimentally working chemists on common desktop computers with userfriendly standard software for various properties and problems like structure determination, reaction mechanism exploration, or computing spectral signatures of molecules or even solids. The theories and computational tools, that is, the "machine under the hood" should (a) provide reasonable results, (b) in short time, (c) for various systems and physical-chemical properties. This at least for larger systems difficult to achieve compromise between accuracy/robustness and computational speed can often be obtained only by applying a hierarchy of different levels of sophistication (multilevel modeling) as sketched in Figure 1.
Here, the extraordinarily complex potential energy surface (PES) of a larger molecule is initially investigated (screened) at a relatively low theoretical level which is subsequently increased to the accurate DFT or wave function theory (WFT) level. Previously, the starting point in typical applications was classical force-field (FF), for example, for initial conformation search. The drawbacks of FF approaches are manifold and in particular, missing parameterization for many elements (e.g., metals) has hampered further development of the field. At this entry point, lowlevel QC methods come into play, that is, they are proposed to replace FFs in many cases (about to a systems size of 500-1,000 atoms) as the lowest theory level. QC approaches have many advantages but one has to keep the computational cost under control. Successful attempts to develop fast but yet reasonably efficient ab initio type procedures have been made. [4][5][6][7] We mention here our own HF-3c, 8 PBEh-3c, 9 and B97-3c 10 small atomic orbital (AO) basis set methods that are meanwhile applied routinely by many groups world-wide. However, they are still too slow for larger screening purposes (e.g., thousands of energy evaluations for 50-100 atom systems) or in applications for typical biochemical structures, for example, protein-ligand interactions, 11 where often long molecular dynamics (MD) runs have to be conducted.
This situation sparked renewed interest in semiempirical QC methods which have a long history dating back to the 1970s with the famous zero-differential overlap (ZDO) based approximations 12 or even to the Hückel Hamiltonians F I G U R E 1 This schematically depicts the proposed multilevel modeling scheme based on the GFN family of methods (first bubble), which are described in this review. These methods are generally used at the initial stages of the multilevel workflow, where a large number of calculations needs to be carried out. This typically includes screening over numerous potential candidate molecules (visualized by the diameter of the cone) or extensive structural sampling of different molecular conformations. In subsequent steps (second bubble), the theory level (accuracy) is increased (electronic structure as well as including thermostatistical [RRHO] and solvation [solv] free energies [G]) while the number of considered candidate structures is reduced. At the end of this workflow, only very few structures need to be treated by high-level theory levels (third bubble) to accurately determine the thermodynamic state (at equilibrium) and the respective property. The latter is often a spectroscopic property and is usually obtained as a thermostatistical Boltzmann ensemble average over all remaining candidates, based on free energies computed at the high-level of theory from the very early days of quantum mechanics. The renaissance of semiempirical methods in the last 20 years was mainly caused by the development of density functional tight binding (DFTB) [13][14][15][16][17] methods by Seifert, Elstner, Frauenheim, and coworkers. The DFTB methods combine the efficiency of the old ZDO type minimal basis set methods with the higher (compared to a Hartree-Fock, HF) accuracy of DFT as the underlying machinery. For more details on their derivation and properties in comparison with HF and even FFs see Section 2.
The major drawback of DFTB in its current state is (similar to FFs) the parameterization because the Hamiltonian matrix elements are formulated in an atom(element) pair-wise fashion leading to thousands of empirical values that have to be determined. This makes it already very difficult to cover consistently only a small (upper) part of the periodic table of elements and hence limits their applicability, in particular for the chemically important metals. Another issue is the design strategy of the method regarding the target properties. Although DFTB in principle approximates DFT, which in turn should be able to describe any chemical system with uniformly high accuracy, the true situation is more complicated. Mainly due to the use of a small minimal AO basis set to express the simplified Kohn-Sham (KS) equations analytically, not all desired properties can be described at a similar target accuracy. The computation of accurate chemical bond or reaction energies is particularly difficult at any semiempirical level. Hence, if the parameterization procedure is forced to describe chemical (interaction) energies well, other, also important features like the computation of molecular structures necessarily deteriorate. This problem of nongenerality with low-level methods is difficult to solve and has been in a certain sense just circumvented with the development of the eXtended TB (xTB) methods which are the topic of this review.
The GFNn-xTB methods (n = 0, 1, 2, see below) are designed from the very beginning as special purpose tools focusing on molecular properties which can relatively easily and in a physically sound manner be described at low-level, namely geometries (vibrational), frequencies, and noncovalent interactions (NCIs) leading to the acronym GFN. Chemical energies (stronger interactions) are not used as primary training data but merely define important cross-checks. Their application to solids under periodic boundary conditions (PBCs) was originally not intended but is now possible with good accuracy. The first version originally termed GFN-xTB (now for better distinguishability dubbed GFN1-xTB) employs basically the same (mostly second order with some terms up to third order) approximations for the Hamiltonian and electrostatic energy as DFTB3 16 but does not rely on an atom pair-wise parameterization. Instead, as in the old ZDO type methods, mainly element specific empirical fitting is used enabling a consistent (but still tedious) parameterization covering a large part of the periodic table up to Z = 86 (radon). In this respect, in 2017 GFN1-xTB filled a gap in the market of off-the-shelf atomistic models as it is fast, robust, reasonably accurate, and works for many metallic systems. It was quickly adapted by the community and implemented in various QC programs like AMS, 18 CP2K, 19 Cuby4, 20 DFTB+, 21 entos, 22 ORCA, 23,24 and TeraChem. 25,26 For a few prototypical applications of GFN1-xTB by us and other groups see  Probably the most serious deficiency of GFN1-xTB/DFTB3 also regarding one of the central properties (i.e., NCI) is the monopole-type, spherically symmetric description of the atom pair-wise electrostatic interactions. This led in 2019 to the development of the successor GFN2-xTB featuring a multipole electrostatic treatment up to quadrupole terms. GFN2-xTB is built and parameterized along the same lines as the GFN1 version but incorporates better physics, the latest D4 dispersion model 32 and is completely pair-parameter-free.
Further extensions of the GFNn-xTB family of methods are also described here briefly (see Figure 2 for a general overview).
In 2019, we tried to roughly keep the accuracy level of the successful GFN1/GFN2 versions but making them significantly faster. The key idea was to avoid the self-consistent-charge (SCC) iterations involving repeated matrix diagonalization which represents the computational bottleneck in almost all semiempirical quantum mechanical (SQM) methods. This could be achieved by formulation of a noniterative first order variant termed GFN0-xTB 33 where the electrostatics are treated classically and only change the electronic structure to first order. In this approach only a single extended Hückel-type Hamiltonian matrix eigenvalue problem is solved, while the electrostatic terms are treated semiclassically within a so-called electronegativity equilibration (EEQ) atomic charge model. [34][35][36][37] This proposed GFN0-xTB method as all other considered GFN approaches has been implemented in the freely available xtb program for testing and is also briefly described here. However, because we think that the GFN0 Hamiltonian can still be substantially improved, we consider it here as a preliminary, proof-of-principle version.
Another non-self-consistent xTB version was developed for the computation of excitation energies and optical spectra of huge systems. In 2013 one of us proposed a simplified Tamm-Dancoff density functional approach (sTDA) for such problems. 38 It employs a regular KS determinant as input, that is, it requires a self-consistent DFT calculation with a standard Gaussian AO basis (e.g., augmented DZ or TZ level). Although the sTDA approximations are reasonably accurate and speed-up UV-or CD-spectra computations enormously, for large systems with thousands of atoms the electronic ground state calculation of the orbitals and orbital eigenvalues still remains the bottleneck. In the so-called sTDA-xTB method, 39 this crucial DFT step is replaced by a single-shot TB calculation with an extended AO basis set including diffuse (low-exponent) functions. This approach was proposed as a very fast single-point electronic structure method already in 2016, that is, one year before the release of GFN1-xTB and in fact was our first published xTB scheme. Meanwhile, parameters are available for almost all elements 40 and extensions to other response type properties have been reported. [41][42][43][44][45] The success and computational speed of sTDA-xTB motivated the development of the general GFN versions which allow full exploration of molecular and solid state PES.
Very recently, the main concepts of the GFN approach have been transferred into a nonelectronic, force-field version termed GFN-FF. 46 The main point of this generic, just coordinate input-based semi-classical potential is its generality meaning that it is, as the other GFN family members, applicable with reasonable accuracy and robustly to almost any system with elements up to Z = 86. This very fast method is also briefly mentioned here and some very large cases, for which even GFN0-xTB would be computationally too demanding, are shown.
Besides example applications and benchmarking of molecular structures and thermochemistry, transition state (TS) localization, chemical space exploration, proteins, and molecular crystals, this review covers general TB theory as well as the methodical aspects of the periodic implementation, the geometry optimization of large systems and compares some computation times. In this review, we mostly restrict details regarding implementation to the standalone xtb program. 47 2 | THEORY 2.1 | Tight-binding theory and comparison of variants 2.1.1 | Origin and connection to other methods Just like the closely related DFTB methods, 15,16,48 the xTB methods are rooted in KS-DFT, and formally, represent a semiempirical approximation to the latter. In the following, the connection of the xTB methods to DFT, DFTB, and classical FFs is outlined, then we will discuss the energy expressions for the individual generations of xTB methods. F I G U R E 2 Overview of the GFN family of methods with main components and classification of the most important terms. Dark gray shaded areas denote a quantum mechanical description while light gray parts indicate a classical or semi-classical description. The parts surrounded by the arrows are treated in an iterative, self-consistent fashion. For a more detailed discussion including the definition of the acronyms see text It is common practice to start the derivation from a semilocal DFT energy expression 15,17 ; however, we have found it more useful to use DFT that includes nonlocal correlation (dispersion) as starting point 49 : Here, ψ i are molecular spatial orbitals with occupation n i ,T r ð Þ is the kinetic energy operator and V n r ð Þ is the Coulomb operator due to the interaction with the clamped nuclei. ε LDA XC ρ r ð Þ ½ is the semilocal exchange-correlation (XC) energy per particle. The inner integral over r 0 contains the interelectronic Coulomb and nonlocal (NL) correlation via the kernel Φ NL C r,r 0 ð Þ. By including the latter term, we find that dispersion interactions naturally occur and establish the connection between tight-binding and intermolecular FF methods (see below). Since we are working with a KS system of formally independent particles, the density is obtained as Next, we reformulate the total energy in terms of a reference density ρ 0 , which ideally is close to the final converged density ρ, and a density difference Δρ, with ρ = ρ 0 + Δρ. Typically, a superposition of spherical, neutral atomic reference densities ρ 0 = P A ρ A 0 is used. 15,17 This allows us to decompose the energy in form of a Hartree energy at the reference density, the Hartree energy difference arising from Δρ, as well as the nonseparable exchange-correlation (local and nonlocal) energies.
The reason that the XC and nonlocal correlation energy are treated separately is rooted in the nonpolynomial dependency on the electron density. If we had started from a functional that includes Fock-exchange, the Hartree energy terms would correspond to Hartree-Fock-like energy expressions (cf. Reference 50). This is just mentioned for the sake of completeness, but not discussed further at this point. The energy at the reference density E H 0 is given by while the Hartree energy difference due to Δρ is given as The reference potentialV 0 r ð Þ is given as Equation (3) is completely equivalent to Equation (1), just reformulated in terms of the difference density Δρ, and consequently, the energy can self-consistently be minimized by solving for the optimum Δρ (see Reference 50). If Δρ is not determined self-consistently, Equation (3) corresponds to a dispersion-corrected expression of the non-self-consistent Harris-Foulkes functional. 51,52 In density functional tight-binding methods, the total energy is Taylor expanded around Δρ = 0.
These fluctuations are typically restricted to the valence orbitals only. The most sophisticated variants truncate this expansion after the third-order term. 16,48 The same is true for the GFN1-xTB 53 and GFN2-xTB 49 approaches, while GFN0-xTB corresponds to truncation after the first order term (see below). We will shortly identify terms occurring at the different orders and then outline the various GFNn-xTB methods in a self-contained manner with the latter being presented in chronological order. An overview over the respective tight-binding orders contained in the various GFN schemes (i.e., in the GFNn-xTB and GFN-FF methods) is given in Table 1. We emphasize at this point that all xTB methods incorporate terms, which have a physical basis in the aforementioned Taylor expansion. Nevertheless, the semiempirical parameters are not precomputed by first principles methods as in DFTB, but optimized on a large fit set to provide the best working parameter combination for the desired GFN target properties. So in that sense, the xTB-type methods employ physically motivated energy terms, but follow a "top-down" parameterization procedure aiming for sophisticated accuracy.
Zeroth order energy A system of noninteracting (i.e., infinitely separated) atoms with spherical, neutral atomic reference densities is chosen as zeroth order reference. This way, the externally experienced electrostatic potentials due to the electrons and nuclei of an atom cancel exactly and the Coulomb terms in Equations (4) and (6) are reduced to on-site terms only. ΔE H is then equal to zero and the total energy can be decomposed into noninteracting atomic total energies (E A ) and orbital energies, which can be precomputed for ρ 0 , as well as interatomic repulsion and London dispersion interaction terms: Here, the dispersion contribution E disp results exclusively from the long-range-in a DFT sense "nonlocal"-correlation effects. The pairwise repulsion energy E rep originates from overlap of the atomic reference densities ρ A 0 , which leads to changes in the Coulomb, exchange, and short-range (local) correlation energy. At zeroth order, we have hence established the connection between tight-binding methods and well-known intermolecular FF potentials of Lennard-Jones 54 or Overview of the terms included in the different GFN-type methods  Buckingham 55 type. In all schemes, the total energy is given relative to the energy of the free, noninteracting atoms, which is why we formally subtracted it from the zeroth order energy in the second line of Equation (8). If we had not included a nonlocal correlation functional to capture dispersion from the very beginning (as, e.g., in References 15,17), the formal equivalence of zeroth order tight-binding and intermolecular FFs would be missing.

First order energy
At first order, the following energy changes are entering: Since first order density fluctuations are enabling changes in the energy, but not in the electrostatic potential, no interatomic Coulomb terms occur, that is, the now charged atoms still experience the zero field from the other atoms. This is different for the dispersion energy, where the nonzero potential due to long-range correlation with ρ 0 is experienced by the fluctuation δρ. Apart from this, only changes of the on-site energies due to reoccupation of the AO levels occur. In tight-binding methods, this is absorbed in an extended Hückel-type term, which essentially is responsible for covalent bonding. Particularly this term is different in the xTB from the DFTB methods in that it includes more empirical but chemically motivated features, which enable a sophisticated electronic structure method with only global and element-specific parameters.

Second order energy
At second order, the energy is changed by Due to the second-order density changes, the net electrostatic (ES) energy between two atoms becomes nonvanishing. Additionally, the (semi-)local XC energy changes in the short-range interatomic region. Both effects are typically described in form of a short-range damped Coulomb energy in tight-binding methods. The dispersion energy is corrected at second order as well.
With second or higher order terms (as in GFN1-and GFN2-xTB), the tight-binding energies require a self-consistent solution.
Third order energy At third and higher orders, no contributions from ΔE H occur, because this term is no more than quadratic in the difference density. Only the nonpolynomial XC (local and nonlocal) terms lead to energy changes at third order.
While the third order dispersion changes are never explicitly considered in tight-binding methods, the (semi-)local effects are included in DFTB3, as well as in GFN1-and GFN2-xTB Hamiltonians. They typically serve the purpose to stabilize relatively highly charged atoms, for example, electronegative elements like oxygen.

| Common ingredients in the GFNn-xTB methods
Before describing the individual xTB schemes separately, we will outline common aspects of the GFN1-, GFN2-, and GFN0-xTB methods, such as common energy terms and the employed wavefunction ansatz. This refers solely to the mathematical form and the explicit parameters are naturally different in these schemes.
The wavefunction choice in GFNn-xTB methods The GFN1-, GFN2-, and GFN0-xTB wavefunctions are all formulated in terms of a partially polarized, mostly minimal valence basis set, consisting of spherical Gaussian-type atomic orbital (GTO) basis functions. Each contracted GTO ϕ μ therein approximates a Slater-type orbital (STO) ϕ STO μ as in Reference 56 Here, the χ μ z are primitive GTOs that contribute to the contracted GTO ϕ μ and d zμ are the corresponding contraction coefficients.
Depending on the AO (and GFNn-xTB Hamiltonian), the number of primitives varies from three to six (see References 33, 49, 53 for details). The AO basis set (Slater exponents) has been optimized simultaneously during the xTB parameterization and hence, is tied to a specific GFNn-xTB version. The molecular orbitals ψ j are expanded as a linear combination in this basis of AOs.
By derivation of the respective energy expressions (see below) with respect to the orbital coefficients, the Roothaan-Hall-type 57,58 generalized eigenvalue equation is obtained.
Here, C is the matrix of orbital coefficients from Equation (13), ε is a diagonal matrix with orbital energies on the diagonal, F is the respective xTB Hamiltonian matrix, and S is the AO overlap matrix. In GFN1-xTB and GFN2-xTB, these equations are solved self-consistently. Just like in DFTB, the xTB methods employ a nonorthogonal basis, that is, different from typical Hartree-Fock-based semiempirical methods, no ZDO approximation is applied in xTB methods. The above wavefunction ansatz can be generalized to periodic systems where ψ j then corresponds to a crystal orbital. The Bloch function equivalent for the one-particle functions is then given by with the Born-von-Kármán cyclic boundary conditions ψ j r . Expanded in the aforementioned AOs, the crystal orbitals are then expressed as: Here the summation, in principle, goes over all N L cells (or identical images), which are related by the translation vector L ! . In the cyclic cluster model (CCM), which is employed in the xtb implementation, we only consider nearestneighbors (N L ! N CCM L ) with corresponding weights 59 for the expansion of the Bloch function. The CCM assumes that all interactions beyond the Wigner-Seitz cell vanish. This condition is usually not fulfilled for electrostatic and dispersion interactions which must be treated differently, for example, by lattice summation. Given this, we can re-write the Roothaan-Hall-type equations for the crystal orbitals as where F 0 ! L ! μν is the element of the Hamiltonian matrix (KS/Fock matrix in ab initio theories), ε j k ! is the energy of the jth crystal orbital and S 0 ! L ! μν is the overlap matrix element. The lattice translations L ! here denote the images in the Wigner-Seitz cell. The w μν are their pairwise weights of the AOs μ and ν to preserve the spacial symmetry of the crystal and to avoid double counting. This has been suggested by Bredow et al. 59 to describe crystal wavefunctions, while including only a minimum number of images compared to other approaches. In our implementation inside the xtb code, we only evaluate the electronic structure at the Γ-point.
To enable covalent bond dissociations, a finite electronic temperature treatment typically at low temperatures (T el = 300 K) is used. 60 This way, fractional orbital occupations are introduced and static electron correlation effects can be incorporated with formally a single-reference treatment. For a variational solution, the GFNn-xTB energy expressions given below are then augmented with an electronic entropy term: T el is the electronic temperature, k B the Boltzmann constant, and n iσ refers to the (fractional) occupation number of the spin-MO ψ iσ . These are given by the Fermi-distribution.
ε i is the orbital energy of the orbital ψ i and ε σ is the Fermi level within the respective spin orbital space (α or β). The occupation n i for the spatial molecular orbital ψ i is given as All GFNn-xTB methods are formulated in a spin-restricted way, that is, the spatial molecular orbitals are identical for α and β electrons. This treatment (also for open-shell systems) is rooted in the fact that no spin-dependent terms are present in the GFNn-xTB Hamiltonians. As a consequence, it is computationally efficient (only one Hamiltonian matrix diagonalization per SCF cycle) and also adds some robustness to the methods. This is beneficial in the context of extensive structural sampling, which these methods are often used for. It should, however, be clear that this treatment provides bad and even qualitatively incorrect energetic splittings for states of different multiplicity, that is, GFNn-xTB Hamiltonians will always favor low-spin configurations.
The classical repulsion energy The repulsion energy in all GFN-type methods is given as an atom-pairwise expression: Here, Z eff are element-specific constants that define the magnitude for the repulsion energy, and may coarsely correspond to effective nuclear charges (screened by the core reference density ρ A,core 0 ). Though these parameters are somewhat agreeing with the latter in GFN1-xTB, they are fitted parameters in all GFN methods. Furthermore, k f = 3 2 is a global parameter, while the α exponents are element-specific parameters. There exists one exception to the value of k f in GFN2-xTB, that is, repulsion energies between first row (H,He) elements are using k H,He f = 1.
The extended Hückel energy As mentioned above, typical tight-binding methods allow covalent bond formation through an extended Hückel (EHT)type energy given as: Here, P μν = P 0 μν + δP μν is the valence electron density matrix in the nonorthogonal AO basis. In a generalized form, the EHT matrix elements read: Here, K ll 0 AB is a shell-specific scaling constant. For a few element combinations in GFN1-xTB and in GFN0-xTB, an element pair-specific scaling parameter enters K ll 0 AB as well (hence the atom labels A and B). S μν is the overlap matrix element of the AOs ϕ μ and ϕ ν . H μμ /H νν are the diagonal elements, which themselves are dependent on the chemical environment in all xTB methods (see below). The last three terms are absent in standard EHT or Wolfsberg-Helmholz type expressions. These terms are system-specific and involve flexible scaling functions.
A common mathematical form in all GFNn-xTB schemes is found in the distance-dependent polynomial scaling function Π(R AB , l, l 0 ): Here, the R AB is the distance between the atoms on which the functions μ and ν are located. R cov,AB are the summed covalent radii and taken from Reference 61 without refitting. k poly A,l are element-and shell-specific parameters. This term allows a distance-dependent adjustment of the EHT energy in addition to the distance-dependence encoded in the overlap matrix elements S μν . The electronegativity dependent term X(EN A , EN B ) and the basis functiondependent scaling are either not present or of nonunique mathematical form in the different GFNn-xTB Hamiltonians. The additional terms (i.e., the last three ones) rely only on global and element-specific parameters. It is clear that the EHT term in Equation (23) is fairly complex, but this way, the required flexibility to describe different covalent bonds is provided without resorting to an element pair-specific parameterization procedure. It is therefore not surprising that the dominant part of the GFNn-xTB methods' parameters (>50%) is incorporated here.
The isotropic electrostatic and XC energy GFN1-xTB and GFN2-xTB share a formally equivalent isotropic electrostatic and XC energy, which originates from the second-order term in the tight-binding expansion.

| The GFN1-xTB Hamiltonian
The GFN1-xTB 53 method is the first xTB version presented with a focus on the GFN properties.
Here, the energy expression is given by: In the first line of Equation (27), the superscript indicates the origin of the respective term in the tight-binding expansion. The repulsion energy in GFN1-xTB takes the form of Equation (21), while the EHT energy corresponds to Equation (22). In GFN1-xTB, the term Y ζ A l , ζ B l 0 À Á = 1 for all cases, while the electronegativity dependent part is given by.
Here, ΔEN is the difference of the electronegativities (standard Pauling values) and k EN = −0.007 is a global parameter. The diagonal EHT matrix elements are atomic environment-dependent: h l A is a shell-and element-specific parameter, while k CN,l are global angular momentum-specific parameters. CN A is the geometric atomic fractional coordination number, which is taken from the D3 dispersion model. 65 As shown in Equation (27), GFN1-xTB includes the isotropic energy term E γ as given in Equation (25). In GFN1-xTB, the short-range damping term is given as: This is the harmonic mean of the effective shell hardness values, which in turn are products of the element-specific atomic hardness and a shell-dependent scaling parameter. The last term in Equation (27) is an on-site third order electrostatic/XC correction: Here, the C n refer to the standard D3 dipole-dipole (n = 6) and dipole-quadrupole (n = 8) dispersion coefficients. These As in the regular BJ-damped D3 model, the damping constant is given by the ratio R 0 AB = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C AB 8 =C AB 6 q and the global damping parameters, which in GFN1-xTB are a 1 = 0.63 and a 2 = 5.0.
Mainly due to the monopole approximation in GFN1-xTB, weak halogen bonds are not described well. Hence, a purely geometry dependent halogen-bond (XB) correction is added.
Here k XR = 1.3 and k X2 = 0.44 are global parameters, while k X is a halogen-dependent parameter. f damp,AXB is a damping function, which depends on the angle of the atoms involved in the halogen bond, that is, the halogen X, its covalently bonded neighboring atom B, and the halogen-bond acceptor atom A.
The typically stronger hydrogen bonds are also not well described by a minimal basis, monopole approximated tightbinding Hamiltonian, and geometry dependent hydrogen bond corrections have been used in some SQM methods. 67 In GFN1-xTB no extra term to describe hydrogen bonds is introduced. Instead, an additional s-AO function on hydrogen is used, which leads to extra stabilization of hydrogen bonds with a formally unmodified xTB Hamiltonian.

| The GFN2-xTB Hamiltonian
The GFN2-xTB 49 method is the second version in the GFN framework. It is the first off-the-shelf tight-binding method with multipole electrostatics, anisotropic XC contributions and charge-dependent (D4) dispersion interactions. The energy expression is given by: The repulsion energy takes the form in Equation (21) in GFN2-xTB. The EHT energy term (Equation (22)) in GFN2-xTB also includes the electronegativity dependent term as in GFN1-xTB (Equation (28)) with k EN = 0.02 and unmodified Pauling electronegativities are used for the difference ΔEN. The shell-exponent dependent term is Here, ζ A l are the STO exponents of the GFN2-xTB AO basis. 56 The exponent-dependency introduces effects similar to the kinetic energy integrals in ab initio theories. Also in GFN2-xTB, the diagonal EHT matrix elements are atomic environment-dependent: h l A is a shell-and element-specific parameter just like δh l CN 0 A . The latter is the proportionality constant for the dependency on the modified GFN2-type coordination number CN 0 A , which is more long-ranged than the D3 variant (see Reference 49 for details).
The isotropic second-order electrostatic energy E γ is given in Equation (25) with the following expression for the short-range damping: η A and η B are element-specific fit parameters, while κ l A and κ l 0 B are element-specific scaling factors for the individual shells (with κ l A = 0 for l = 0). The dispersion energy is described by a modified D4 dispersion model Here, f n ð Þ damp,BJ R AB ð Þ is the damping function from Equation (33). The term in the second line is the three-body Axilrod-Teller-Muto (ATM) term and the last line is the corresponding zero-damping function for this term. The two-body London dispersion energy is depending on the covalent coordination number CN A cov and the atomic charges q A . Different from the regular D4 model, 32 the atomic partial charges are taken from a Mulliken population in GFN2-xTB and are solved self-consistently. The three-body term is environment-dependent through the covalent coordination numbers, but does not depend on the partial atomic charges in the D4 model. The damping and scaling parameters in the dispersion model are a 1 = 0.52, a 2 = 5.0, s 6 = 1.0, s 8 = 2.7, and s 9 = 5.0. The third-order term in GFN2-xTB is also an on-site term, which is formulated in a shell-specific form: Here, q l is the partial shell charge and Γ A is an element-specific parameter. K Γ l is a global shell-specific parameter (see Reference 49).
GFN2-xTB includes anisotropic electrostatic and XC terms. These are given as: Here, μ A is the cumulative atomic dipole moment of atom A and Θ A is the corresponding traceless quadrupole moment. These cumulative atomic multipole moments (CAMM) 68 describe the local atomic multipole moment contribution in a Mulliken approximation scheme. The distance dependence including damping is given by The damping function is related to the one in the original D3 dispersion model. 65 a n are adjusted global parameters, whereas R AB 0 = 0:5 R A0 0 + R B0 0 À Á determines the damping of the anisotropic electrostatic (AES) interaction. R A0 0 is made dependent on the GFN2-type coordination number (see above) for many lighter elements. GFN2-xTB includes all multipole contributions up to second order for the electrostatic energy.
The second-order anisotropic XC energy in GFN2-xTB is given by Again, μ A and Θ A are the CAMM mentioned above. f μ A XC and f Θ A XC are fitted element-specific parameters. These terms capture changes due the anisotropic deformation of the electron density around an atom A. To some extent, shortcomings of the small AO basis set, that is, the lack of polarization functions, may be compensated by this term.
Due to the anisotropic electrostatic and XC terms, GFN2-xTB does not require any additional hydrogen or halogen bond corrections.

| The GFN0-xTB Hamiltonian
The most recent member in the family of xTB methods is GFN0-xTB. Since no terms from the tight-binding expansion beyond first order are included, no self-consistent field procedure is necessary, making the method about 5-20 times faster compared to GFN2-xTB. The tradeoff is less flexibility in the electronic structure, similar as in the Harris-Foulkes functional 51,52 mentioned above in comparison to KS-DFT. Consequently, the formal choice of the reference system or its corresponding parameterization becomes more important. GFN0-xTB includes two essential types of correction procedures to recover a transferability comparable to the self-consistent xTB variants: one is a classical pairwise potential to correct for covalent bonds of heteroatoms (srb, short-range bond correction). The other, even more important one is incorporating semi-classical atomic charges determined variationally from an EEQ model. The latter describes the atomwise isotropic electrostatic energy (in place of the E γ energy in GFN1-and GFN2-xTB) in the form of a purely classical energy. The charges furthermore serve the purpose of modifying some of the EHT parameters. This can be regarded as a system-specific ad hoc re-definition of the reference system parameters (i.e., of ρ 0 in Equation (7)). In other words, GFN0-xTB works with a system-specific reference system of spherical, but in general, partially charged atoms. The GFN0-xTB energy is then given as: The term ΔE (0) in the first line of Equation (45) formally contains all the changes in the Hamiltonian due to the effective change of the zeroth order reference. The repulsion energy E rep has the established form as in Equation (21). The dispersion energy is computed via the D4 dispersion model (E D4 disp ) Different from the default D4 dispersion model as well as its GFN2-xTB variant, the three-body term (cf. Equation (40)) is dropped, as it would noticeably increase the computational cost of the method. As in the default D4 model, the partial charges are taken from the aforementioned EEQ model. These charges result from self-consistently solving for the E EEQ energy in Equation (45) with the constraint that the total charge is preserved: Here, χ A is the electronegativity of atom A, which is made dependent on the atomic environment via a modified coordination number (mCN A ): κ A is an element-specific fitted parameter and EN A the Pauling electronegativity. The modified coordination number is defined as: R cov AB are the summed covalent radii from Reference 69. J AA is an element-specific parameter related to the atomic hardness, and γ AB is related to the inverse root mean square of the atomic radii of atoms A and B (see Reference 33 for details). This term provides a description of the electrostatic energy at zeroth order in the tight-binding model.
The EHT energy in GFN0-xTB is given by the expressions in Equations (22) and (23). In the EHT part, the shellexponent dependent term takes the same form as in GFN2-xTB (see Equation (37)), while the electronegativity dependent term is shell-dependent in GFN0-xTB: Compared to GFN1-and GFN2-xTB, a higher power term of the electronegativity is included. Modified Pauling values for the electronegativity are used, while k ll 0 EN is a shell-specific and b EN is a global parameter. The diagonals of the EHT Hamiltonian are made flexible with respect to the modified coordination number as well as the atomic partial charges: While the first part of this equation is formally identical to GFN2-xTB, the last two terms describe the effective modification of the reference atom due to the charged state, which is derived from the EEQ model. The parameters δh l q A and Γ l q A are formally related to the chemical hardness and its derivative with respect to the particle number, respectively, but are simply fitted shell-and element-specific parameters in GFN0-xTB. We note at this point, that first-order electrostatic effects due to neighboring atoms are currently not incorporated in the GFN0-xTB model, but might be added in a future revision.
The last term in Equation (45) is the short-range bond correction, similar to the "short-range basis" corrections in the HF-3c and B97-3c methods. 8,10 This correction is only applied for heteroatomic pairs with atomic number Z A ∈ [5,9]. Here, k srb , η srb , and g scal are global fit parameters. The summed covalent bond radii R srb AB are modified by the electronegativities as: where R 0 A are the damping radii from the D3 model, 65 which are modified by the coordination number and c 1 and c 2 are global parameters. E srb essentially corrects the energies and bond lengths for polar bonds of second period elements.
The GFN0-xTB variant obviously contains more empiricism in the Hamiltonian compared to GFN1-and GFN2-xTB, but still avoids pairwise parameterization. While the non-self-consistent treatment obviously makes the method less costly, it has also a practical advantage: due to the lack of Fock exchange, self-consistent tight-binding methods also suffer from self-interaction error related phenomena, which become particularly pronounced in polar systems like proteins, where the SCF calculations might not converge anymore. These problems can be remedied by including an implicit solvation model 53 ; however, GFN0-xTB does not suffer from these defects and typically yields larger HOMO-LUMO gaps, as a result from the non-self-consistent treatment.

| Treatment of lanthanide elements
In the GFNn-xTB Hamiltonians, the "f-in-core" approximation is used throughout for lanthanide elements. That is, they are treated as 4d transition metals with three valence electrons and no explicit consideration of the f-electrons. This treatment is motivated by ab initio calculations indicating that the f-electron shell lies below the valence shell and their implicit handling, for example, in form of a pseudopotential or by appropriate parameterization (GFNn-xTB) is reasonable. 70 This is different, if spectroscopic properties are of interest, and these elements have been neglected in the simplified time-dependent xTB model for excited states for this reason (see next subsection).

| Simplified time-dependent xTB for excited states
The computational bottleneck of calculating electronic spectra with the simplified versions of full 71 and Tamm-Dancoff approximated 38 (TDA) time-dependent (TD)-DFT (i.e., sTD-DFT or sTDA-DFT) is the calculation of the ground state orbital coefficients and energies. The very fast semiempirical ground state tight-binding method in sTDA-xTB aims at eliminating this bottleneck. 39 The general work-flow of this approach is shown in Figure 3.
The xTB ground state calculation consists of two parts: a valence tight-binding (VTB) part and an extended tightbinding (XTB) part. Note that the two parts are denoted in this chapter as XTB and VTB, respectively, to distinguish them from the other xTB methods. First, atomic charges are determined in a truncated SCC procedure by VTB. The second, single-diagonalization XTB step is using these VTB charges. The VTB part applies a minimal valence and partially polarized AO basis set as in GFN2-xTB, whereas the second XTB part uses an augmented minimal valence expansion including diffuse functions for the treatment of states with Rydberg excitation character. The basis function exponents F I G U R E 3 Computational workflow of the sTDA-xTB method. The VTB (valence tight-binding) part uses geometry-dependent electronegativity difference-based charges as input and generates Mulliken charge-based CM5 charges 72 in a trunctated SCC procedure. The XTB (extended tight-binding) step then uses these charges as input and generates orbitals in a minimal+diffuse basis set for a subsequent excited state calculation at the simplified Tamm-Dancoff approximated TD-DFT (sTDA) level are made coordination number-dependent in both VTB and XTB. The XTB Hamiltonian matrix elements have a GFN1like form and read Their diagonalization yields the orbital energies and coefficients used in the sTDA or sTD formalism. γ AC denotes the interelectronic repulsion function in the Mataga-Nishimoto formulation 62 that reads as: where R AC is the interatomic distance and η the chemical hardness. It should be noted that the VTB charges in Equation (54) are not plain Mulliken charges, but modified according to the CM5 model. 72 The applied basis functions in the different parts are summarized in Table 2. Virtual orbitals are shifted to resemble hybrid functional reference excitation energies. The black line denotes a one to one correspondence of the two data sets energy gaps. Apart from a local excitation correction (see Reference 39), the sTDA/sTD part is not modified in sTDA-xTB compared with sTDA-DFT. The method has been parameterized to reproduce accurate theoretical reference vertical excitation energies only, and hence, it is not suited for describing ground state energetics or PES regions far away from the Franck-Condon point. Opposed to the GFNn-xTB methods, it is a single-point method without a gradient implementation. Therefore, its application for photochemistry and related MD studies is expected to be rather limited. The comparison of the fitted properties-atomic charges (VTB) and vertical excitation energies (sTDA-xTB)-with the theoretical reference is shown in Figure 4. An excellent agreement between the xTB and reference data is observed.

| Nonelectronic variants (GFN-FF)
The evolution of GFN1-, GFN2-and GFN0-xTB inspired the development of a generic FF. We have already established the connection between atomistic, intermolecular Lennard-Jones-type FFs and zeroth order tight-binding methods in Equation (8). Furthermore, the experience with GFN0-xTB has shown that the approach of ad hoc adjustment of the reference system (following the picture of Equation (7)) by adding more flexibility and more reasonably chosen parameters has some prospects for success. The development of the so-called GFN force-field (GFN-FF) approach can be regarded as in line with these findings.
From the practical point of view, the main focus of GFN-FF is directed towards the description of very large biomacromolecular systems such as (metallo-)proteins, 76 supramolecular assemblies 77 and metal-organic frameworks. 78 Screening of very many structures or treating molecules with more than a few thousand atoms routinely is impractical at any electronic GFNn-xTB level. It is intended as a versatile tool for drug design in life sciences and structure screening in various fields of chemistry. [79][80][81] Therefore, GFN-FF introduces an approximation to the remaining quantum mechanics in GFN0-xTB by replacing the zeroth order TB terms with classical bond, bending angle, and torsion angle potentials. To remain accurate in the description of conjugated systems, GFN-FF retains an iterative Hückel scheme for a selected set of π-atoms. The resulting bond orders affect the force constants and other energy relevant parameters of the system. To yield accurate results, the FF parameters are fitted to reproduce B97-3c 10 equilibrium geometries and frequencies. Thereby, a strictly global and element specific parameter strategy is applied and no element pair specific parameters are employed. This approach is a unique feature of all GFN methods and differs strongly from the parameterization strategies of other FFs (e.g., see . Special attention is paid to the simple application of GFN-FF. As input only Cartesian coordinates and elemental composition are required from which fully automatically all potential energy terms are constructed.
The total GFN-FF energy expression is given by where E cov refers to the bonded FF energy and E NCI describes the NCIs. In the covalent part, interactions are described by asymptotically correct (dissociative) bonding, angular, and torsional terms. Repulsive terms are added for bonded and non-bonded interactions separately. Additionally, a three-body correction to the bonded part is added (abc), that extends beyond the sum of pair-wise interactions In the noncovalent part, electrostatic interactions are described by the EEQ model as in GFN0-xTB. Overall, two sets of EEQ charges are used. One depends as usual on the actual geometry, whereas another set of charges is exclusively covalent topology based, introducing further polarizability and leading to large simplifications in terms of gradient computations. Dispersion interactions are taken into account by a simplified version of the D4 scheme, 32 in which the dispersion coefficients are scaled by atomic charges. Without any detailed electronic information, the correct description of hydrogen and halogen bonds is challenging. Therefore, additional charge-scaled corrections (denoted HB and XB, respectively) are applied to the noncovalent energy: GFN-FF reaches quadratic scaling O N 2 ð Þ of the computation time for energy and forces, whereas all GFNn-xTB methods show cubic scaling with respect to the number of atoms. It is the computationally most efficient member of the GFN family. For illustration, a comparison between total CPU times for single point and gradient calculations for all GFN methods is given in Figure 5 for small to medium sized proteins (from 300 up to 6,000 atoms).
To achieve SCC convergence, the GBSA(H 2 O) solvation model (see next subsection) had to be employed for GFN1-and GFN2-xTB. With GFN0-xTB, a speed-up factor of 2-20 is achieved, while GFN-FF improves on this even further. It is about two orders and three orders of magnitude faster than GFN0-xTB and GFN1-/GFN2-xTB, respectively.

| Continuum solvation model (GBSA)
To create realistic computational models, solvent effects have to be accounted for, either by explicitly including solvent molecules (and dynamical sampling) or by a parameterized implicit solvent model. Due to its favorable computational cost, the latter approach is pursued in the framework of xTB methods and for GFN-FF.
Two kinds of polar implicit solvation models are suitable for xTB, either a polarizable continuum model or a generalized Born (GB) model, where the former has the disadvantage of introducing an integration grid, which can introduce a significant overhead for large scale calculations. Therefore, we will only discuss the implementation of the GB model present in the xtb program.

Energy and gradient CPU time (s)
F I G U R E 5 CPU times (given in seconds on a logarithmic scale) for single point energy plus gradient calculations of 14 proteins.
Computations were performed using a quad-core desktop machine with 4.20 GHz Intel i7-7700K CPUs. The PDB identifiers are given on the abscissa, the corresponding number of atoms is given on top In the GB model, a molecule is considered as a continuous region with a dielectric constant ϵ in surrounded by infinite solvent with a dielectric constant ϵ out . 86 The electrostatic interaction in the presence of a polarized solvent can then be expressed as the solvation energy where a A/B are the effective Born radii of the atoms A/B. The GB model is introduced in the xTB Hamiltonian as a second-order fluctuation in the charge density and described by the atomic potential V GB A given as The Born radii are evaluated by an Onufriev-Bashford-Case (OBC) corrected pairwise approximation to the molecular volume given as where R cov A is the covalent radius of atom A, a scale , and R offset are global parameters and b = 1.0, c = 0.8, and d = 4.85 are the parameters for the OBC correction, which correspond to the GB OBC II model. 87 The OBC correction increases the Born radii for atoms buried deep inside a molecular cavity, which would usually be underestimated. Ψ A is the pairwise approximation to the volume integral given by with Ω being the pairwise function used to approximate the volume integral, which is only dependent on the distance and the covalent radii. Note that the covalent radius of the second atom is scaled by the element-specific descreening value s B to compensate the systematic overestimation of the volume by this approach. In addition to this polar contribution to the solvation energy, a nonpolar surface area contribution depending on the solvent accessible surface area (SASA) is given by where γ A is the surface tension and σ A is the SASA of atom A. To evaluate the latter we resort to the smoothly differentiable numerical approach introduced from Reference 88 and integrate the surface area on an angular Lebedev grid. The SASA is also used in an empirical hydrogen-bond correction to the generalized Born energy as where g HB A is the strength of the hydrogen bonds between this atom and the solvent molecules and A A is the surface area of the free atom. This simplified form has been chosen since the hydrogen-bond correction should enter the Hamiltonian as a potential due to the charge dependency.
The total solvation free energy is given by where an additional shift is included depending on the chosen reference state of the solution. This solvation free energy is fitted with four global parameters, the Born radius offset, the Born radius scaling, the probe radius of the solvent molecule, and the value of ΔG shift as well as three element specific parameters, the descreening value, the surface tension, and the hydrogen bond strength to reproduce COSMO-RS16 solvation free energies. 89

| IMPLEMENTATION
The GFN methods are implemented in the open-source software xtb, which provides a framework to use them on their own or together with other algorithms. xtb provides a userfriendly interface for performing geometry optimizations, vibrational frequency calculations, and MD simulations. Usual workflows like geometry optimization, vibrational (harmonic) frequency calculation, and evaluation of thermodynamic functions are easily available as composite keywords. Thus, an additional input file, besides the input geometry, is not necessary to perform most calculations with xtb. The usual obstacle of learning a new input format and adjusting numerical thresholds is minimized as much as possible, making it fairly straightforward to start running calculations with xtb. Through its unique design, the xtb program has become an integral part in a number of algorithms and programs, developed in our group or other groups. One of the early users of the xtb program is the quantum chemistry electron ionization mass spectrometry (QCEIMS) method implemented in the program of the same name. 90 qceims is accessing the self-consistent xTB methods to provide electronic structure information, like ionization potentials (IPs) or to drive high-temperature MD simulation to allow for first principles predictions of electron ionization mass spectra, which provides fundamental insights into the fragmentation process not available with standard machine learning-based algorithms. While not tailor-made for the prediction of TS or reaction paths, the GFNn-xTB methods are suitable to provide a fast initial path when coupled with an appropriate algorithm. One such interface exists for the growing string method (GSM) program by the Zimmermann group implemented in the mGSM program. 91,92 To avoid introducing an additional interface in an established software package, the xtb program is wrapped in a flexible way to mimic output for an already existing and tested interface to mGSM. We find that this approach is most sustainable and particularly advantageous for users already familiar with the mGSM program, such that they can quickly integrate the GFNn-xTB methods and any future extension of the xtb program to their already existing workflows. In a recent publication of our group, we have highlighted this particular combination of mGSM and xtb. 93 The most prominent use of xtb is in the conformer-rotamer ensemble search tool crest, 94,95 which acts as a driver to perform and schedule calculations with xtb. A robust file-based input and output communication between the programs is established by using shared memory parallelization to allow for multiple xtb instances being driven by the scheduler provided with crest. Starting from an input structure, xtb will be used to generate the distorted structures by (biased) MD simulations. From the generated trajectories crest is selecting structures for relaxations performed in parallel with multiple xtb instances. Each instance regularly reports back its optimization progress, such that crest can re-rank, filter, and prioritize the calculations most efficiently. Finally, we want to highlight the usage of both xtb and crest within the computational chemistry framework for energetic sorting of conformerrotamer ensembles, named enso. The enso framework is designed to automate the re-ranking of conformer-rotamer ensembles at DFT level. 28 It relies implicitly on xtb-via-crest, but also directly for the calculation of thermostatistical corrections with the modified rigid-rotor-harmonic-oscillator (mRRHO) partition functions 96 or of solvation free energy contributions using the GBSA solvation models available for the GFNn-xTB methods. Those contributions to free energy are then used to refine conformer-rotamer ensembles produced by crest in an initial stage together with lowcost DFT methods like PBEh-3c 9 or B97-3c 10 before starting more expensive hybrid-functional calculations or property calculation like, for example, NMR shifts (see Figure 1). Besides this selection of applications and algorithms developed in the recent years by our group, the GFNn-xTB methods were quickly adopted in many existing QC program packages, like AMS, 18 CP2K, 19 DFTB+, 21 entos, 22 Orca, 23,24 and TeraChem. 25,26 Moreover, interfaces to the xtb program are already available in large frameworks like ASE 97 or Cuby4. 20 To facilitate the usability of xtb, an in-depth documentation is available covering all of the possible workflows. 98 This documentation is managed by established software documentation tools, namely sphinx 99 and asciidoctor, 100 which are used to automatically generate, among others, Linux manual pages, a printable PDF manual, and the static HTML code for the online-documentation. The latter is hosted by the read-the-docs project to be easily accessible to all users (see https://xtb-docs.readthedocs.io). Additionally, we dedicated the development of the xtb program to the open-source idea and, hence, published it under the Lesser General Public License (v3+) with the source code publicly available on GitHub (see Reference 47). One of the greatest burdens of QC programs is the proper installation, which can be especially tricky in high-performance computing (HPC) environments. To actively support system operators in their efforts to make xtb available to their users, we make xtb available with the conda crossplatform package manager via an official conda-forge channel and also together with the easy build toolchain allowing to install xtb in an existing module system in HPC environments. For developers, we decided to offer dual support of both the established CMake build system and the relatively new but promising meson build system. A solid unit testing framework is also available and readily coupled to continuous integration services of Travis-CI and GitHub-Actions.

| Geometry optimization of large systems
Due to the inherent computational efficiency of the underlying electronic structure method, a similarly fast and robust geometry optimization procedure had to be developed along with the GFNn-xTB methods. Otherwise, the geometry relaxation steps can amount to a significant fraction of the overall computation time, in particular at the GFN-FF level.
Special constraints on the optimization algorithm are the choice of a general and robust coordinate system, which is inexpensive in the generation, and a fast and reliable geometry displacement step with a better formal scaling and prefactor than the fastest available GFNn-xTB method. A model Hessian based on the work of Lindh et al. 101 is used for the generation of an approximate normal mode coordinate (ANC) system. With an efficient screening, the calculation of this Hessian can be performed with a quadratically scaling algorithm. However, its diagonalization to obtain the transformation matrices scales cubically with the number of atoms. To reduce this bottleneck the coordinate system is generated from the actual Cartesian coordinates only every 20-40 optimization steps. For the calculation of the coordinate displacement we established a combination of the rational function algorithm 102 together with the Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian update step. 103 F I G U R E 6 Schematic representation of the fragmented Hessian algorithm. A complex molecular structure with N c atoms is divided into several chemically reasonable fragments (of size N frag ) and the Hessian of each fragment H frag is diagonalized individually. The diagonalized Hessian H diag is constructed from the fragments, instead of diagonalizing the Hessian of the entire system H c , thus reducing the overall computational costs For systems with several thousands of atoms, even this setup can become too slow when combined with faster methods than GFN0-xTB, that is, GFN-FF. Even if both displacement and update are evaluated by a L-BFGS algorithm, 104 additionally a faster diagonalization strategy had to be designed based on a Hessian fragmentation scheme (see Figure 6).
With the covalent topology at hand, a fragmentation scheme identifies in a first step noncovalently bound fragments. For explicit solvent molecules, this number can be rather large. Therefore, the volume occupied by the system of interest is divided in cubic boxes and all NCI fragments of small size within such a box are collected together in a cubic cluster approach. On the other hand, large NCI fragments are further divided in smaller chemically reasonable fragments upon the application of the Dijkstra algorithm. 105 This algorithm based on graph theory finds the shortest paths between two nodes in a graph. For each resulting chemical fragment, a fragmented Hessian matrix is set up and diagonalized separately. The problem of diagonalizing one large matrix is thus broken down to the diagonalization of a few much smaller matrices (divide-and-conquer type algorithm). To achieve high efficiency, this process is performed in parallel. With the fragmented Hessian scheme, the xtb optimizer, termed ANCopt, is capable of performing geometry optimization of large and complex macromolecular structures.

| Molecular structures
The geometry optimization of molecular structures is a common application for SQM methods, specifically for large systems, where DFT and other first-principles methods become computationally infeasible. The GFN1-and GFN2-xTB methods already proved to be fast and robust semiempirical optimization tools for computing reasonably accurate molecular structures with elements up to radon. 49,53,106,107 To substantiate this statement, we summarize their performance for well-established and recently published benchmark sets including organic and main group molecules, NCIs, and transition-metal as well as organometallic and lanthanide complexes.
With respect to organic and main group molecules, we will first focus on the established ROT34 108,109 set of 12 small to medium-sized organic gas phase structures and only briefly mention the results for other benchmarks (for details, see References 49,53). The ROT34 set comprises 34 equilibrium rotational constants B e derived from accurate spectroscopically measured rotational constants B 0 which were back-corrected by calculated nuclear vibrational effects, such that they can be directly compared to (local) clamped-nuclei Born-Oppenheimer minimum structure values computed with the electronic structure methods, which are to be assessed. It is a sensitive test for the accuracy of calculated molecular structures since small changes in the geometry can already lead to significant deviations from the backcorrected experimental reference values. As long as conformational changes can be excluded, too large theoretical B e values indicate shortened covalent bonds (or overall shrinked molecule size) w.r.t. the experimental reference.
The performance in terms of relative deviations for the GFNn-xTB methods is compared to the dispersion-corrected DFTB3-D3(BJ) and PM6-D3H4X 110,111 SQM methods as well as to the low-cost composite DFT method B97-3c and accurate B3LYP 112 -D3(BJ)/def2-QZVPP 113 structures (see Figure 7a). Well-performing DFT methods such as the two latter show mean unsigned relative deviations (MURDs) and standard relative deviation (SRDs) below 0.5%, 10,108 while SQM methods expectedly yield larger deviations. The ZDO-based PM6-D3H4X method yields SRDs and MURDs of ≤2.5% and, generally, less reliable geometries are predicted compared to DFTB3-D3(BJ), GFN1-xTB, and GFN2-xTB (one molecule (isoamyl-acetate) was excluded from the benchmark due to problematic conformational changes). The latter ranks second among the tested methods and is only slightly outperformed by GFN1-xTB. This could be due to the fact that the relative weight of geometries during the fitting procedure has been larger than for GFN2-xTB. The GFN1-xTB method yields an SRD of 1.1% and a mean relative deviation (MRD) of only 0.5% thus providing only slightly too small molecules on average, which is an excellent result for a SQM method. A comparably small SRD is obtained with DFTB3-D3(BJ) but at the expense of a systematic shift towards too long bonds bonds, which is common for DFTB as well as for (semi)local density functionals. 108 GFN0-xTB and GFN-FF yield an SRD that is approximately halfway between PM6-D3H4X and DFTB3-D3(BJ). In contrast to GFN0-xTB, which shows a clear shift towards too small molecules, the MRD of GFN-FF is close to zero and, hence, the geometries calculated with this universally applicable FF do not show a systematic error on average, though the scatter is somewhat larger compared with GFN1-and GFN2-xTB.
In addition, two benchmark sets were assessed which contain more unusual and challenging structures: the HMGB11 9 covering heavy main group molecules and the LB12 9 testing particularly long bonds. Here, PM6-D3H4X with a mean absolute deviation (MAD) of 10.1 p.m. for the HMGB11 set is clearly outperformed by GFN1-xTB and GFN2-xTB with MADs ≤3 p.m. Particularly the latter predicts the long bond lengths of the LB12 fairly well with an MAD of 12.6 p.m. (excluding S 2 + 8 from the statistics), while PM6-D3H4X yields an MAD of 20.5 p.m. (excluding HAPPOD and KAMDOR from the statistics). The good performance of GFN1-and GFN2-xTB for these "unusual" cases may be attributed to their element-specific parameterization. Furthermore, molecular structures with dominating NCIs were assessed comprehensively employing center-of-mass (CMA) distance deviations for the fully optimized S22 114,115 complexes and relative deviations w.r.t. extrapolated CMA distances for the S22x5, 116 S66x8, 117 and X40x10 118 benchmark sets. Even though all tested methods delivered comparably good results for the S22 with an MAD of ≈14 p.m., GFN2-xTB represents a clear improvement especially for X40x10. The MRD for the latter approaches zero and the MURD is with 2.5% only about half as large as that of PM6-D3H4X. This also indicates that the (anisotropic) electrostatic terms are well balanced with repulsive and dispersion interactions and, therefore, particularly GFN2-xTB is well suited for the optimization of noncovalent complexes and supramolecular assemblies. In contrast to organic and main group molecules, the geometry optimization of the chemically important transition-metal and organometallic complexes with SQM methods is much less common. This is due to the fact that these structures often already pose a challenge for single reference QM methods and because there are hardly any SQM methods that are parameterized for the whole (Z ≤ 86) periodic table. With the development of the GFNn-xTB methods this situation has improved significantly, as a promising alternative to the ZDO-based PMx methods has been made available.
To substantiate and to quantify this statement we demonstrate the performance of GFN1-and GFN2-xTB for the well-established TMC32 119 3d transition-metal structures benchmark set comparing 50 bond lengths, as well as for two recently published comprehensive molecular structure benchmark sets, the TMG145 107 including 145 transition-metal complexes and a set of 80 challenging lanthanide complexes. 106 The complexes in both sets are treated as low-spin systems.
For the TMC32 set, GFN1-xTB is the most accurate of all investigated SQM methods (MAD = 5.0 p.m.), closely followed by its successor GFN2-xTB (MAD = 5.7 p.m.). On average, GFN1-xTB yields more systematic deviations and slightly shorter bonds compared with PM6-D3H4X. Still, all transition-metal complex structures were reproduced by all three methods without dissociation or significant chemical reorganization. The TMG145 benchmark set was compiled from 145 challenging "real-life" transition-metal complexes (25-200 atoms) up to Hg (Z = 80) with high-quality hybrid DFT (TPSSh 120 -D3(BJ)-ATM/def2-TZVPP 113 ) structures as reference. In total, 941 bond lengths and 2,846 bond angles are compared.
GFN1-and GFN2-xTB both predict distinctive bond angles around the metal atom as well as metal-ligand bond lengths with good accuracy (bond angles/bond lengths: MAD = 4.0 /7.5 p.m. and 3.9 /8.3 p.m. for GFN1-and GFN2-xTB, respectively; cf. Figure 7b) for the individual deviations for the subset with DFT reference structures) with high efficiency (typically below 1 minute computation time compared with days to weeks for the reference calculations). This is also reflected in the Cartesian heavy-atom (all elements except H) root-mean-square deviation (hRMSD), which is on average only 0.34 and 0.33 Å for GFN1-and GFN2-xTB, respectively. The tested ZDO-based methods PM6-D, 110 PM6-D3H4, 110,111 and PM7 121 perform clearly inferior with MADs of about 30 p.m. and 9 for bond lengths and angles, respectively, as well as twice as large hRMSD on average, mainly due to about 40% of the complexes for which these methods predicted qualitatively incorrect molecular structures. In contrast, only ≈11 and ≈ 7% of the complexes could not be optimized to the chemically correct structure with GFN1-and GFN2-xTB, respectively, thus underlining their robustness even for such challenging systems. Four representative structure overlays together with the respective hRMSD are shown in Figure 7d.
The applicability of the GFNn-xTB methods for all elements up to Z = 86 also allows the optimization of molecular structures with rather unusual elements important for special applications such as luminescent lanthanide-based metal-organic frameworks. 122 Thus, the correct computation of such lanthanide-containing structures is of practical relevance. For GFN1-xTB, this was assessed on a challenging and comprehensive test set of 80 lanthanide structures. Except for three promethium complexes, for which accurate PBE0-D3(BJ)/ZORA-def2-TZVP 123 (SARC2-ZORA-QZV 124 basis for Pr) reference structures were calculated, the optimized geometries are benchmarked w.r.t. high-quality X-ray structures. For more than half of the complexes, the structures optimized with GFN1-xTB yield a hRMSD <0.6 Å. Only for a few larger multi-nuclear lanthanide clusters bridged by anionic ligands, it was more difficult to reach the default convergence criteria.
However, this would be also the expected behavior of higher-level QM methods for such challenging systems. Considering the uncertainty due to the neglect of crystal field and crystal packing effects, this means that most reference structures could be reproduced qualitatively correct and 44 out of 80 structures even show a good quantitative agreement with the reference structures. Compared to the Sparkle/PM6 method, which is the only semiempirical competitor for lanthanide chemistry, the mean hRMSD of 0.65 Å obtained with GFN1-xTB for all complexes is significantly smaller, even slightly outperforming the low-cost composite QM method HF-3c (0.68 Å). This is a very good result keeping in mind that the lanthanide atoms are treated in an "f-in-core" approximation by the GFN1-xTB method (see Section 2.1.6). GFN1-xTB also clearly outperforms Sparkle/PM6 in terms of computational wall times mainly due to the fact that significantly fewer SCF and optimization cycles are required on average.
Together with the promising results for the TMG145 and TMC32 benchmark sets, GFN1-xTB and GFN2-xTB proved to describe organometallic bonding motifs significantly better compared with the ZDO-based PMx methods, which are currently the only other SQM methods applicable to such systems. This is a particularly remarkable result since no specific modifications for the treatment of organometallic complexes were introduced in the development of the GFNn-xTB methods and they are also clearly superior in terms of computational speed. Moreover, the finite electronic temperature (Fermi smearing) along with the Hamiltonian, which is devoid of exchange/spin density terms, also enables the robust optimization of molecules with small HOMO-LUMO gaps or open shells, which are challenging for many single reference QM and most other SQM methods. The excellent accuracy/computational cost ratio of the GFN1-and GFN2-xTB methods together with their robustness enables the fast and reliable geometry optimization of typical transition-metal complexes as well as the routine application, for example, for semi-automated conformational search procedures for larger organometallic complexes and the optimization of very large systems with several thousands of atoms such as extended metal-organic polyhedra at a SQM level of theory (see Figure 8 for three representative examples taken from Reference 107). GFN2-xTB fully optimizes the structures of medium-sized polyhedra (≈1,100-1,400 atoms) in about 1 hr (≈200 optimization cycles). The relatively small hRMSD (0.68 and 0.80 Å respectively) for such large systems show impressively how robust, efficient, and accurate such optimizations with GFN2-xTB in combination with a GBSA solvation model are. Even the Pd 30 L 60 (BF 4 ) 60 polyhedron with almost 2,500 atoms could be optimized in a few days yielding a qualitatively correct molecular structure.
In summary, it can be stated that the general applicability for almost all elements of the periodic table, the efficient and robust treatment even of very large and complicated electronic structures, a reliable description of the major part of the PES, especially for the NCIs, as well as the coupled implicit GBSA solvent model are strong points of the methods. Hence, GFN1-and GFN2-xTB methods are perfectly suited for optimizing molecular structures for various chemical applications. Since only comparatively low computational resources are required for this purpose, these methods provide an unprecedented quantum-mechanical tool which can be of great benefit for chemical research, for example, for the calculation of large organometallic structures. GFN0-xTB performs slightly worse, but may be of great interest in a revised form for optimizations under PBCs. First assessments of the new universal force field GFN-FF yielded promising results 46 for geometry optimizations of proteins. This is remarkable, since GFN-FF was not specifically adjusted to proteins. In Figure 9, the F I G U R E 8 Structure overlays of the GFN2-xTB optimized (transparent blue; the GBSA solvation model was applied) and X-ray reference structures (color code; the respective CSD code is given) for three metal-organic polyhedra (carbon bound hydrogen atoms are omitted for better visibility). The heavy-atom RMSDs (hRMSD) are given and the timings were obtained with normalopt settings on 14 CPU Intel ® Xeon ® E5-2660 v4 2.00 GHz CPU cores performance of GFN-FF along with the universal (UFF 125 ) and highly specialized FFs (OPLS2005, 126 AMBER* 127,128 ), respectively, is assessed for geometry optimizations of 70 protein structures. 129 For the dihedral angles ϕ, ψ, and χ, which act as soft descriptors for local displacements in the respective protein backbone, GFN-FF provides similar or even better accuracy than the specialized FFs OPLS2005 and AMBER*, respectively. Only for ω, significant deviations in the larger protein structures were observed with GFN-FF, indicating that the barrier for rotation around the respective C N bonds was underestimated. For both the hRMSD and the C α RMSD, GFN-FF provides comparably accurate results to the computationally much more demanding SQM method GFN2-xTB. UFF, on the other hand, performs clearly worse than GFN-FF for all statistical measures and is therefore, in contrast to the latter, not recommended for protein structure optimizations. Furthermore, the parameterization up to Z = 86 allows the optimization of metalloprotein structures with GFN-FF.

| Thermochemistry and kinetics
The GFNn-xTB family of methods does not reach the aforementioned accuracy level for the calculation of (covalent) thermochemical and kinetic properties. However, this is typical for SQM methods in general, 130,131 which are therefore not generally recommended for accurate routine calculations of, for example, covalent reaction energies and barrier heights. However, particularly for the latter, GFN2-xTB performs surprisingly well for a SQM method (vide infra, cf. Figure 10d). Nonetheless, the semiempirical approximations on which the GFNn-xTB methods are based have the clearest implications for covalent bonds and hence, they (and other SQM methods) should only be used for qualitative estimates of covalent thermochemistry. However, there are many applications where the GFNn-xTB methods, due to their high efficiency combined with the general parameterization and high robustness, prove to be very useful, for example, as a starting point for multilevel, large-scale screening applications (see Section 4.4) or the semi-automatic localization of TS presented in the next subsection. Moreover, the performance for NCIs is very good, as will be summarized in the following for eight intermolecular NCI energies benchmark sets from the comprehensive GMTKN55 132

MAD (°) MAD (Å)
F I G U R E 9 Performance of GFN-FF in comparison to the universal (UFF) and highly specialized FFs (OPLS2005 and AMBER*) for a set of 70 protein structures. 129 Average hRMSD, C α RSMD, and average deviations of four distinctive dihedral angles w.r.t. the corresponding crystal structure given in Å and degree, respectively database. They are composed of various small molecules including also heavier elements and more unusual interaction motifs. Specifically, the interaction energies in rare gas complexes (RG18), n-alkane (ADIM6), and various other noncovalently bound dimers (S22 and S66) as well as between heavy element hydrides (HEAVY28) were assessed. Furthermore, hydrogen-bonded complexes between H 2 O, NH 3 , or HCl and carbene analogues (CARBHB12), pnicogencontaining dimers (PNICO23), and halogenated dimers including also halogen bonds (HAL59) were tested (for original references to the used subsets of the GMTKN55 database, see Reference 132). The respective MADs obtained for the GFNn-xTB methods, GFN-FF, and PM7 with respect to the very accurate coupled cluster reference values are shown in Figure 10a together with average MADs over all tested sets. Values for PM6-D3H4X as well as low cost composite and large basis set DFT methods are also shown. Except for the ADIM set, for which GFN2-xTB is the worst of all tested methods and the CARBHB12, for which GFN1-xTB slightly outperforms GFN2-xTB, the latter is the most accurate method in all other considered benchmark sets. This is also reflected in its remarkably small average MAD of only 0.9 kcal/mol, which is slightly better than that of the composite methods HF-3c and B97-3c and even comparable to some large basis dispersion corrected GGAs. This is especially true for the rather special HEAVY28 and RG18 test sets, where GFN2-xTB achieves excellent results (MAD = 0.6 and 0.2 kcal/mol, respectively), most likely due to a more advanced treatment of dispersion interactions within the D4 scheme compared to D3(BJ). In contrast, PM7 shows much larger MADs for both sets (HEAVY28: 2.9 kcal/mol, RG18: 0.6 kcal/mol).
The benefit of the AES (see Section 2.1.4) is already visible for the commonly applied S22 and S66 test sets, which are also often used for fitting purposes. For example, the MAD for S66 obtained with GFN2-xTB (0.7 kcal/mol) is 0.6 kcal/mol lower than that of the monopole based GFN1-xTB method, which illustrates the importance of AES for such systems. Moreover, GFN2-xTB does not show any larger deviations for the hydrogen bonded systems, although in contrast to GFN1-xTB, no special terms for hydrogen bonds are included. PM7 performs comparably well like GFN2-xTB for the S22 and S66 sets and is with an MAD of only 0.2 kcal/mol clearly the most accurate SQM method for the ADIM6 test set. However, for CARBHB12 and especially for the PNICO23 and HAL59 sets, the errors for PM7 are much larger (up to five times higher MAD than GFN2-xTB) and significant outliers occurred. The good performance of the GFN2-xTB method for the PNICO23 and HAL59 sets can be attributed to the inclusion of higher multipole electrostatic terms, since in these molecules a good description of the anisotropic electron density of the bound pnicogen and halogen atoms is crucial for the accuracy of the respective interaction energies. The GFN1-xTB method, despite its monopole approximation, gives comparably good results for HAL59 and still reasonably accurate interaction energies for the PNICO23 systems which can be explained by its well balanced parameterization and special halogen bond correction.
GFN0-xTB and GFN-FF perform similar to GFN1-xTB (average MAD = 1.1 kcal/mol) with slightly larger average MADs of 1.4 and 1.1 kcal/mol, respectively. Only for the CARBHB12 test set, significantly larger deviations were observed for both methods. Considering its very low computational cost, the performance of GFN-FF for intermolecular NCIs is outstanding.
The excellent performance of GFN2-xTB, in comparison to other SQM methods, for NCIs of smaller molecules is also observed for larger and more difficult supramolecular complexes, as could be shown with the S30L 133 benchmark set. It tests 30 association energies of noncovalently bound neutral and charged complexes with up to 200 atoms for which accurate DLPNO-CCSD(T) 134,135 /CBS* 136 reference values are available. 10 With an MAD of only 4.0 kcal/mol, GFN2-xTB can compete with some large basis set dispersion-corrected DFT methods, which is an outstanding result for an SQM method for such a challenging benchmark. Especially the very small errors for the charged systems are striking, whereas PM6-D3H4X, among others, shows up to 20% deviation from the corresponding reference association energies. This again confirms that electrostatic and polarization interactions are described much more accurately in GFN2-xTB than with other SQM methods. Larger deviations for GFN2-xTB were only found for complexes 7-12 with conjugated π systems and dominant van der Waals interactions, for which the respective association energies were overestimated, probably due to non-additive dispersion interactions that are not yet described accurately enough with the ATM term (see Section 2.1.4). The universal force-field GFN-FF yields similarly accurate results (MAD = 4.1 kcal/mol) as GFN2-xTB and even deviates less from the reference values for van der Waals interaction-dominated conjugated π systems, although only a simplified version of the D4 scheme is applied, 32 thus, indicating a good and well balanced parameter fit. For the charged systems, however, the errors are, as expected, somewhat larger compared to GFN2-xTB. GFN1-xTB yields a MAD (6.1 kcal/mol) between that of PM6-D3H4X (5.1 kcal/mol) and DFTB3-D3(BJ) (6.9 kcal/mol) and hence does not reach the accuracy of GFN2-xTB and GFN-FF.
Overall it can be stated that the GFN methods, especially GFN2-xTB and GFN-FF, are well-suited to investigate noncovalently bound systems and, due to their very low computational effort compared to dispersion-corrected DFT methods with large basis sets, reliable calculation of such systems with several thousand of atoms becomes feasible.
Next, we discuss the performance of the GFNn-xTB and other SQM methods for eight conformational energy benchmark sets taken from the GMTKN55 database, specifically relative energies of alkane (ACONF), amino acid (Amino20x4), butane-1,4-diol (BUT14DIOL), inorganic (ICONF), melatonin (MCONF), tri-and tetrapeptide (PCONF21), and sugar conformers (SCONF) as well as energy differences between RNA-backbone conformers (UPU23 136 ) and one additional test set (MALT205 137 ) comprising 205 conformers of α-maltose (for original references to the used subsets of the GMTKN55 database, see Reference 132). This is a challenging test for SQM methods since the accurate description of these small relative energies requires a well balanced accuracy for both intramolecular noncovalent and covalent interactions. Figure 10b shows the MADs for the tested methods and the average MADs over all nine sets in comparison with low-cost composite QM methods and larger basis set DFT results. Among the SQM methods tested, GFN2-xTB is on average the most accurate, closely followed by GFN1-xTB and GFN-FF. Especially the significant improvement compared to GFN1-xTB for more polar and hydrogen bonded systems, with the exception of the UPU23 set, led to an excellent performance of GFN2-xTB for the Amino20x4 (MAD = 0.9 kcal/mol), PCONF21 (MAD = 1.8 kcal/mol), and SCONF (MAD = 1.6 kcal/mol) test sets. For the BUT14DIOL and MCONF test sets, similarly good results for all assessed SQM methods were observed, with GFN-FF performing best for the latter (MAD = 0.5 kcal/mol). For the MALT205 set, which is challenging due to the many involved intramolecular hydrogen bonded, GFN0-xTB and PM6-D3H4X (MAD = 5.2 and 5.1 kcal/mol, respectively) showed significantly larger deviations than GFN1-xTB, GFN2-xTB, and GFN-FF with the latter yielding the lowest MAD of 2.8 kcal/mol. Except for the PCONF and UPU23 benchmark sets, the mean deviations (MD) obtained with GFN1-and GFN2-xTB are negative for all tested conformer energy sets. In general, all tested SQM methods and also GFN-FF tend to underestimate the high-level coupled-cluster conformational energies that serve as a reference values, especially for the higher-energy conformers. Furthermore, the MADs of the SQM methods are on average three to four times larger than those of DFT methods.
Nevertheless, the investigated conformational energy benchmarks clearly show that GFN2-xTB provides more reliable conformational energies than PM6-D3H4X, PM7, and even the computationally more expensive HF-3c method. For polar and hydrogen bonded systems, GFN2-xTB also outperforms GFN0-xTB, GFN1-xTB, and GFN-FF, potentially due to the inclusion of the AES term (see Section 2.1.4). The universal force-field GFN-FF performs on average similar to GFN0-xTB and GFN1-xTB, which is particularly remarkable and offers a valuable low-cost alternative to GFN2-xTB for the conformer search of large molecules with several hundreds of atoms. The proper energy ranking of conformers is an important application field for SQM methods (see Section 4.4) and the results for the conformational energy benchmarks suggest that particularly GFN2-xTB and GFN-FF should be well-suited for this purpose.
Finally, we turn to five of the barrier height oriented benchmark sets of the GMTKN55 database (for original references to the used subsets, see Reference 132). The MADs of the assessed methods (GFN1-and GFN2-xTB, PM6-D3H4, and DFTB3-D3(BJ) for the five test sets are shown in Figure 10d). Among all methods considered, GFN2-xTB clearly performs best with the lowest MAD for the diverse reaction barriers set (BHDIV10: MAD = 8.1 kcal/mol), the barrier heights for rotations around single bonds (BHROT27: MAD = 1.2 kcal/mol), and inversions (INV24: MAD = 3.5 kcal/ mol) as well as for barriers in proton transfer reactions (PX13: MAD = 2.7 kcal/mol). Only for barriers of tautomerization reactions (WCPT18), GFN2-xTB (MAD = 3.8 kcal/mol) is slightly outperformed by PM6-D3H4 (MAD = 3.5 kcal/mol). This performance of GFN2-xTB is remarkable for an SQM method, since no barrier heights were included in the fitting procedure.
Especially for PX13, for which PM6-D3H4 gives quite large deviations (MAD = 16.0 kcal/mol), and for BHDIV10, for which DFTB3-D3(BJ) performs relatively poorly (MAD = 13.3 kcal/mol), GFN2-xTB even yields slightly smaller MADs than PBE0 with a large basis set, which is an outstanding result for SQM methods. GFN1-xTB, although slightly worse than GFN2-xTB in all considered benchmark sets, still predicts reasonably accurate barrier heights.
Overall, the GFNn-xTB methods, particularly GFN2-xTB, yield acceptable to good (for NCIs and barrier heights) accuracy for thermochemistry applications, even though this was not the main focus (except for noncovalent systems) in the development of the GFNn-xTB methods.

| Reaction mechanism exploration
Reliable SQM methods also offer new perspectives for the fast screening of reaction mechanisms and TS for transitionmetal and organometallic systems, if they are interfaced with state-of-the-art TS localization algorithm such as, for example, the double-ended growing string algorithm 91,92 (see Section 3). However, this necessitates that the corresponding SQM method yields not only reliable TS geometries but also a sufficiently accurate thermochemistry for such systems. In a recent study, 93 this was assessed employing two organometallic reaction energy benchmark sets, the MOR41 138 and the WCCR10. 139 Although the development of the GFNn-xTB methods was not specifically targeted to predict accurate reaction energies, GFN1-xTB (MOR41: MAD = 13.2 kcal/mol, WCCR10: MAD = 10.9/kcal mol) and GFN2-xTB (MOR41: MAD = 11.8 kcal/mol, WCCR10: MAD = 10.7 kcal/mol) achieve reasonably accurate relative energies for these challenging reactions (with GFN2-xTB performing slightly better), comparable to dispersion uncorrected DFT methods and not so much worse than, for example, M06-2X 140 /def2-QZVPP with an MAD = 7.3 kcal/ mol for the MOR41 reactions. The PMx methods are the only other SQM approach that can currently be employed for the calculation of such systems, but they yield substantially larger deviations (PM6-D3H4: MAD = 21.7 kcal/mol, PM7: MAD = 22.7 kcal/mol).
The performance of the GFNn-xTB methods for barrier heights of organometallic reactions was assessed with a similarly encouraging result for a slightly modified version 93 of the MOBH35 141,142 benchmark set comprising 29 backward and forward barriers. The GFN1-and GFN2-xTB methods in combination with mGSM localized the correct TS in 89.7 and 86.2% of all investigated reactions, respectively and predicted the barrier heights with reasonable accuracy (GFN1-xTB: MAD = 8.8 kcal/mol, GFN2-xTB: MAD = 8.2 kcal/mol). The PMx methods are significantly less robust (only 72.4 and 69.0% of the TS were correctly localized with PM6-D3H4 and PM7, respectively) and they are also clearly outperformed by the GFNn-xTB methods in terms of accuracy for the barrier heights (PM6-D3H4: MAD = 17.1 kcal/ mol, PM7: MAD = 19.6 kcal/mol), while their computational costs are more than twice as high.
In addition, the GFNn-xTB methods provide reasonably accurate TS geometries 107 (cf. also Section 4.1) thus allowing for an efficient reoptimization on a higher level of theory. The computationally involved initial reaction path generation could be significantly accelerated with reliable SQM methods until a certain residual gradient is reached. Further computational savings can be achieved by avoiding superfluous reaction path searches at the significantly more expensive DFT level, because the GFNn-xTB methods allow a reliable and efficient chemical plausibility check of the possible paths.
Averaged over all 29 reactions, the tested GFNn-xTB methods need about 5 min to obtain a converged reaction path compared to several hours at low-cost DFT level (TPSS 143 -D3(BJ)/SVP 144 ), which clearly demonstrates the great benefit of this workflow for investigating organometallic reactions. An example reaction is shown in Figure 11. Although the GFN2-xTB reaction barrier of the rate determining TS is significantly underestimated compared to TPSS-D3(BJ) and the coupled-cluster reference values, the predicted reaction energy is clearly more accurate than that calculated with the significantly more computationally expensive DFT method. of the MOBH35 141,142 test set. The mGSM reaction path along the reaction coordinate (RC) is depicted in blue whereas the relevant TS, whose structure is shown as inset, is marked with a red circle. The respective reaction energies ΔE R and reaction barriers ΔE ‡ predicted by GFN2-xTB (blue) and TPSS-D3(BJ)/SVP (green) as well as the corresponding coupled-cluster reference values (gray) are also shown Furthermore, if a rough estimate of an upper limit for the respective barrier height is sufficient, the RMSD-push-pull path optimizer, which is described in Reference 145 and implemented in the xtb code, provides an even faster alternative for this purpose. In summary, this workflow opens up new perspectives for the fast and sufficiently accurate theoretical investigation of challenging organometallic reaction mechanisms. Only minimal user input is required which offers a wide range of possible applications, for example, in hybrid multilevel schemes aiming at the fully automated exploration of the reaction space.

| Chemical space exploration
The exploration of the chemical space is an important task in computational chemistry. Macroscopic properties of physical observables (e.g., pKa values, NMR, CD, or IR spectra) can be well approximated as a thermostatistical average of the respective properties of low-energy chemical species under thermal equilibrium conditions. This implies that the underlying compound space (i.e., the three-dimensional structures of the molecules) is known, leading to a sampling problem in a space of huge dimensionality. A good balance between computational cost and accuracy is thus required.
The prerequisite for sampling is a continuous, well-behaved PES which has to be explored by the underlying computational method involving typically several thousand to millions of energy and gradient evaluations. Therefore, carrying out this initial exploration step with QM methods is only possible for very small molecules at a DFT or WFT level and other methods have to be chosen for larger molecules. As shown in the previous sections, the methods of the GFNn-xTB family provide an excellent cost-to-accuracy ratio and should enable extensive sampling procedures. In fact, the exploration of the low-energy chemical space is one of the main application fields for the GFNn-xTB level of theory.
In the following, we discuss two practically important examples: finding molecular conformations and preferred protonation sites. These and several other screening algorithms have been implemented in a standalone program called crest, 94 that acts as a driver for the xtb program.

| Conformations
Molecular conformations are the primary example for the low-energy chemical space. Conformations are defined by minima on the PES with the same covalent bonding topology, obtained by rotation around bonds or inversion-type processes. Because of the huge number of possible conformations already for medium-sized molecules, finding those minima is a challenge for any exploration algorithm. Furthermore, the description of the PES, that is, basically the conformational energies, has to be qualitatively correct. From recent benchmark studies it is known 49,53 that this is difficult to achieve and that one has to apply a relatively large error margin (energy window) for the SQM preselected structures such that important conformations are not sorted out by mistake. This further increases the space of the considered structures. In recent publications, 94,145 GFNn-xTB was successfully combined with Cartesian RMSD-based meta-dynamics (MTD) simulations into an efficient, automated workflow for the task of finding molecular conformations. Note that the generation of conformers by rotation around all dihedral angles becomes impractical already for small systems and furthermore requires the a priori definition of conformational coordinates, which is avoided entirely by the general MTD sampling procedure.
As an example the well known pain-relieving medication drug 2-(4-isobutylphenyl)propanoic acid, also known as ibuprofen, is shown in Figure 12.
Here, all relevant conformations are generated with the above mentioned MTD approach, but without the need to define and rotate any of the four dihedral angles shown in Figure 12b explicitly. From chirped-pulse Fourier transform microwave spectroscopy, 146 four ibuprofen conformations are observable in the gas-phase. These four conformers coincide with the four lowest conformers found at the GFN2-xTB, as well as at the DFT (PBE0-D3(BJ)/def2-TZVPP) level (cf. Figure 12a). Furthermore, for the low-lying (i.e., populated) conformers of ibuprofen, the GFN2-xTB and DFT PES seem to be almost perfectly parallel as there is only a minor change ((1 kcal/mol) between the relative energies of the structures at the two levels. Only for higher lying conformations (Figure 12c), the difference in relative energy becomes more pronounced. It is a typical observation that conformational energies for organic molecules at the GFNn-xTB level are underestimated (see Section 4.2), that is, the PES appears to be too flat.
As a larger much more complicated example, the protonated polypeptide Ac-Ala 19 -Lys-H + is taken from Reference 94. The gas-phase conformations of this peptide were previously studied in a combined theoretical and experimental effort. 147 In Reference 94 it was shown that the conformational screening at GFN2-xTB level is able to correctly predict important conformational features, such as the change from an α-helical structure into a folded 3 10 -helical structure depicted in Figure 13a by protonation.
Furthermore, by re-ranking the GFN2-xTB ensemble at the PBE0-D4/def2-TZVPD//PBEh-3c level of theory even slightly better conformations than the previously known ones could be found. By visualization of the conformational energies at the GFN2-xTB and DFT level (see Figure 13b), the aforementioned issue of too flat SQM method PES becomes clear. Noticeably, the narrowly spaced conformational energy levels of GFN2-xTB are spread out over a much larger energy window at the DFT level. However, low-energy conformers on the GFN2-xTB PES are often also among the low-energy conformers at higher theoretical levels and vice versa. This relation is essential for a practically useful multilevel approach.
A great advantage of the GFNn-xTB methods in combination with automated screening procedures is their computational robustness and parameterization for almost all elements of the periodic table. This makes it possible to investigate also systems that would normally pose problems for either the applied theoretical method or the screening algorithm. As an example, Figure 14 shows a macrocyclic molecule containing a Pd 2+ -ion taken from Reference 148.
Finding conformations for this metallomacrocyclic molecule is highly challenging for three reasons: (a) the chemical composition (i.e., the metal ion), (b) interdependent dihedral angles in the macrocyclic part of the molecule, and (c) the rigid NNN pincer-backbone of the molecule. While (a) will most likely prevent the usage of, for example, standard FFs for the conformational screening, the combination of (b) and (c) will make it nearly impossible to obtain conformations from simple conformer generators based on dihedral angles rotation. In contrast, the automated MTD based screening at GFN2-xTB[GBSA(acetonitrile)] level is able to provide a global minimum conformer closely resembling the crystal structure (cf. Figure 14b), as well as a rather diverse conformational ensemble (cf. Figure 14c) for the complex.
In summary, automatized conformational screening at the GFNn-xTB level can be applied over a broad range of chemical systems. The main advantage lies in the robustness and overall well performance of the methods, but also in the computational speed. The computation of several thousand of energies is impractical at DFT or WFT with the current technological limitations, but becomes feasible at the SQM level. The implementation in the xtb program also allows for some special applications, such as constrained conformational sampling, which extends the capabilities of this screening procedure even further, for example, to adsorption problems or TS. For a more detailed discussion and more examples see Reference 94.

| Protonation sites
From recently published studies it is known that TB methods reasonably well describe relative proton affinities (PA), 17,27,53 for example, for the PArel subset of the GMTKN55 data base. The description of relative PA is important, that is, to rank different protonation sites of a molecule (also referred to as protomers). The first automated protomer screening procedure at the GFN1-xTB level was discussed in Reference 27. Here, the idea was to generate all protomers of a molecule automatically and rank them based on their energy, that is, the relative PA, to obtain estimated preferred protonation sites. For an automation this requires the generation of initial protonated candidate structures. Common protonation sites are typically πand lone pair centers, which can be obtained easily from an SQM calculation. In Reference 27, it was demonstrated that this procedure works reasonably well for almost arbitrary chemical systems. As an electronically rather complicated example, the (PCP)Ir(N 2 ) complex and its protomers generated at the GFN2-xTB [GBSA(THF)] level are shown in Figure 15.
This complex is known to be able to activate N 2 upon protonation, which was studied by combined NMR and cyclic voltammetry experiments. 149 It was concluded that the protonation selectively occurs at the metal center of the complex and no protonation at the dinitrogen group is observed. In a completely automatized fashion this can be validated at the GFN2-xTB level, where the protonation occurs either in a three-center-two-electron (3c-2e) bond between the metal F I G U R E 1 4 Pd-pincer metallomacrocycle: (a) Lewis structure of the metallomacrocycle, (b) overlay between the crystal structure (transparent blue) and the lowest conformer generated at the GFN2-xTB[GBSA(acetonitrile)] level, (c) overlay of the 10 lowest energy conformers and the pincer, or directly at the metal (cf. Figure 15b,c). All other possible protomers of the complex, including protonation at the dinitrogen ligand are significantly higher in energy (see Figure 15d,e).
The automatized protonation site identification at the GFNn-xTB level was extended in Reference 94 to other (mononuclear) ions. The resulting capability, for example, to screen for alkalization sites is a further proof of the robustness of GFNn-xTB methods. Note, that the automated generation of protomers (or similar ion adducts) is only possible due to the QM nature of the approach: starting structures are generated from localized molecular orbital centers without other prior information and new bonds can be formed freely during a structure optimization, which is both impossible at FF or chemoinformatical level. From the examples above and the previous publications 27,94,145,150 it is concluded that the automation of GFNn-xTB-based screening procedures provide reliable workflows for the generation and ranking of various low-energy structures. Typically, they require further treatment at higher DFT or WFT levels of theory as discussed in the introduction (cf. Figure 1). Furthermore, the protonation feature described here has been extended in Reference 94 to an automatized procedure for generating low-energy tautomers as well.

| Electron impact mass spectra simulations
One of the important special features of TB methods at finite electronic temperature is that covalent bonds can be dissociated properly into atoms (for some example potential energy curves see Reference 49). This was exploited already in 2013 (originally with DFTB3-D3(BJ)) for the first principles computation of electron impact mass spectra (EI-MS) of molecules 151 by combination of QC with stochastically initiated MD simulations. Since in this approach, a large chemical space resulting from fragmentations is automatically explored, we consider it as an example here. Actually, the long term idea to be able to simulate EI-MS was one of the reasons that led to the development of the GFNn-xTB methods and very recently, GFN1-and GFN2-xTB was employed on a large scale for this purpose. 90,152 Note, that the whole procedure termed QCEIMS would hardly be technically and computationally feasible without the herein described robust SQM methods. The principle and main features of QCEIMS are briefly described in the following paragraph and depicted in Figure 16.
In the first step, the (neutral) input molecular structure is equilibrated in an MD run from which a predefined number (typically a few hundred) snapshots are randomly selected and saved as starting coordinates. For complicated cases, a preceding detailed conformational analysis could be conducted and the entire procedure is started separately for the found conformers.
For each snapshot, the molecular orbital spectrum is calculated and a Mulliken population analysis is performed. With this information, the internal excess energy (IEE) and internal conversion time are estimated for proper initial (randomized) conditions. Then, the snap-shots are instantaneously (valence) ionized and independently propagated in time on the cationic GFNn-xTB PES until a reaction occurs in the simulation. The IP of the (neutral) fragments is calculated and used to determine their statistical charge. The fragment with the highest charge is selected for propagation in a cascade fashion. It can undergo further fragmentation until either no internal energy is left, or the fragment is getting too small. All charged fragments are counted, stored, and by gathering data from all production runs, the mass spectrum is obtained as shown on the right part of the Figure 16 in comparison with the respective experimental data (inverted for better visibility). At a typical electron impact energy of 70 eV, the IEE of the target molecule is several eV so that very high reaction rates in the few ps regime are initiated. Hence, although one has to sample over hundreds to thousand of trajectories, each of them can be computed rather fast such that the overall computational effort is manageable at an SQM (but not at DFT) level.
The calculations carried out in this way are basically following first principles and fully theoretical, that is, not based upon any experimental results and represent a viable, "black-box" type alternative to rule-based, chemoinformatical approaches. Typically, a very reasonable (semi-quantitative) agreement between theory and experiment is observed. 153 This is somewhat surprising since barriers and reaction thermodynamics (for which the GFN methods are not parameterized) strongly influence the sampled relative reaction rates. Overall, GFN1-and GFN2-xTB (and also DFTB3-D3 (BJ) for organic molecules) perform rather similar for a broad set of molecules. In addition to the spectrum, the reactive MD trajectories yield detailed chemical information on the reaction mechanisms leading to specific fragments (spectral peaks). Because overall millions of energy/force evaluations are required in this approach, this is only possible with fast, robustly converging, and dissociative QC methods like GFNn-xTB.

| Thermostatistical corrections
The computation of harmonic vibrational frequencies by QC methods is a common application mostly conducted to obtain thermostatistical corrections from energy to enthalpy (H) or free energy (G). This requires knowledge of the equilibrium molecular structure (and atomic masses) as well as (at least) the second derivatives of the energy with respect to all nuclear displacements (Hessian matrix), both of which are provided rather accurately by the GFN methods.
Here, GFN2-xTB-computed zero-point vibrational energies (ZPVE, which contribute dominantly to H) and total molecular free energies at 298 K (G 298 ) are compared to corresponding values at the low-cost B97-3c DFT theoretical level. Because such standard DFT calculations are roughly two orders of magnitude slower, a very substantial reduction of the needed computational resources may be achieved if they could routinely be replaced by SQM. The B97-3c vibrational frequencies are on average close to experimental fundamental ones and normally require no scaling to be comparable to experimental data. 154 They are here taken as a reasonable reference to benchmark GFN2-xTB (the GFN1-xTB variant performs similar but slightly worse and is therefore not discussed here). Figure 17 shows a comparison of ZPVE values at B97-3c and GFN2-xTB values for a set of 39 medium sized organic molecules taken from a benchmark study of Li et al., 155 which was originally employed to establish computational procedures for computing molecular entropies. The corresponding free energy values are also depicted in Figure 17 which F I G U R E 1 6 Automatic workflow for GFNn-xTB based EI-MS calculations (modified from Reference 152). The hexafluorobenzene molecule is taken as an example (compared to the ZPVE) emphasize more structural aspects as well as the low frequency part of the vibrational spectrum. The G 298 values refer to the mRRHO treatment from Reference 96 with a rotor-cutoff of 20 cm −1 .
As can be clearly seen from the graph, there is a very good reproduction of the DFT reference thermostatistical properties by GFN2-xTB. The MAD for the ZPVE data is only 0.34 kcal/mol (MD = 0.29 kcal/mol) with a maximum error of 1.6 kcal/mol. In actual applications where normally differences of the values for reactants and products are taken, the effective error may be even smaller because of cancellation. The performance for the free energies is similar with an MAD of 0.5 kcal/mol and a maximum error of 2.0 kcal/mol. These deviations are small compared to other sources of error in typical thermochemical studies for larger systems like the intrinsic error of the DFT/WFT electronic energies 132,156 or an inaccurate treatment of solvation effects. 157,158 Table 3 shows that this also applies to larger or electronically more complex cases.
Here, the absolute values (and hence also the absolute deviations) are larger than for the previous benchmark set but nevertheless, still small and rather systematic deviations of less than 5% (<2% for the two anti-viral drugs) are obtained with GFN2-xTB. Typically, the predicted values are too small which could be remedied by a simple scaling procedure. Note that for the largest system with 94 atoms (lopinavir) the frequency calculation took about 7.5 and 0.03 hr, respectively, at the DFT and SQM levels showing the tremendous speed-up at very little loss of accuracy. Furthermore, the severe sensitivity of DFT-computed thermostatistical data (vibrational partition function) on the numerical integration grid for low-frequency modes has been pointed out recently. 159 Eliminating this issue would require the use of very large, computationally costly integration grids in the calculation of the nuclear Hessians with DFT methods. The GFNn-xTB energy expressions are fully analytic, and hence, represent a robust electronic structure scheme for the (kcal/mol) Reference (kcal/mol) F I G U R E 1 7 Comparison of GFN2-xTB thermostatistical data with corresponding B97-3c DFT reference values for 39 organic molecules ranging from ethane (smallest) to n-octane (largest). The solid line shows the one-to-one correspondence and the dashed ones indicate a common error range for chemical accuracy, that is, ±1 kcal/mol calculation of harmonic frequencies to be used in free energy calculations. For these reasons, the application of GFN2-xTB is highly recommended instead of costly and numerically sensitive DFT methods in the computation of vibrational frequencies in large scale computational projects or early stages of extensive mechanistic studies.

| Protein examples
The accurate simulation of large biomolecular systems like proteins remains one of the "holy grails" in computational chemistry. 2 Efficient theoretical models can be applied in tandem with the experiment, for example, to elucidate molecular mechanisms in vivo, 160 to compute target-drug interactions 11,161 or to simulate secondary structure rearrangements. [162][163][164] The GFNn-xTB family of methods is a promising candidate for such tasks and has already been successfully applied multiple times in the context of biomolecular systems. 29,129,165 One example is the comprehensive study of Schmitz et al. 129 that evaluates the performance of GFN1/2-xTB variants on structure optimizations of 90 protein structures (of which 20 contain metal atoms). The experimentally derived X-ray structures serve as reference in this study. GFN2-xTB performs for various standard geometrical descriptors very similar to special-purpose FFs (OPLS2005 and AMBER*) and outperforms even the wave function-based HF-3c method. For the subset of metalloproteins-for which standard FFs and HF-3c could not be applied-GFN2-xTB shows remarkable performance in reproducing the secondary structure motifs and the coordination sphere around the metal centers. Figure 18a depicts the structural overlay of the calcium-containing hydrolase 1NX2 shown as a first example. The secondary structure and metal sites are preserved during optimization. The C α RMSD of 0.89 Å is reasonably small and the structures differ mainly in the unstructured loop regions. Figure 18c depicts the Cu-containing lyase protein (5FTZ). The overlay of the X-ray and GFN2-xTB optimized structure shows a remarkable agreement in the secondary structure. The RMSD of only 0.48 Å is one of the smallest within the whole metalloprotein subset. GFN2-xTB reproduces-also for this protein-all hydrogen bond-stabilized structural motifs like helices and sheets very well. The reason for the structural differences in the unstructured regions remains elusive. Figure 18b depicts Comparison of GFN2-xTB thermostatistical data (in kcal/mol) with corresponding B97-3c DFT reference values for a set of six molecules (two antiviral drugs, one large conjugated π-system with a coordinated metal atom, and two transition-metal complexes) 1NX2 and 1QJJ were computed as closed-shell systems with charges equal to +1 and −11, respectively. 5FTZ was treated as a doublet with a net charge of +6 (see Reference 129 for details) the coordination sphere of the zinc metal center of the hydrolase 1QJJ. The mean absolute error of all metal-ligand bond lengths is 0.07 Å. GFN2-xTB perfectly reproduces the coordination sphere with respect to the experimental Xray structure. The combination of GFNn-xTB variants for structural problems with the sTDA-xTB method for excited states opens interesting possibilities for investigation of large biomolecular systems. Some of us have presented the applicability and performance of the sTDA-xTB method for electronic excitation spectra like ECD and UV-vis for proteins and DNA fragments. 29 Figure 19 shows the ECD spectrum of cytochrome c computed with sTDA-xTB on a GFN2-xTB optimized structure as an example.
The characteristic ECD bands of the α-helical secondary protein structure are reproduced very well. This is remarkable, considering that no molecular fragmentation procedure is applied, that is, the entire protein is computed fully quantum mechanically. The computation takes 5 hr for the optimization and 33 hr for the spectrum calculation (involving 30,405 excited electronic states) on a standard compute node with four Intel ® Xeon ® CPU E3-1270 v5 @ 3.6GHz cores.
Previous studies have shown that for some optical properties, MD sampling is essential. 29,40,44,[167][168][169] Here, an MD simulation is performed, and snapshots are taken equidistantly from the resulting trajectory. The desired property is then computed for each snapshot and averaged. The MD simulation at the tight-binding level is not feasible for systems with more than 1,000 atoms and for the necessary simulation lengths of several nanoseconds. Figure 5 shows the timings for single-point plus gradient calculations for the different GFN methods for 14 proteins. The GFN-FF method, with a three orders of magnitude lower computational cost compared to GFN2-xTB, enables MD simulations of biomolecular systems with up to 5,000-10,000 atoms. Even the inclusion of explicit solvent molecules for structure optimization and MD simulation is now possible. Figure 20 shows the structure of hemocyanin (PDB: 1JS8) with an explicit water shell of 6 Å distance to the protein surface (10,074 atoms). The structure of this system is optimized on loose thresholds within 1,058 cycles and in 9 hr and 33 min at the GFN-FF level of theory (31 s per F I G U R E 1 9 Calculated ECD spectrum of cytochrome c (blue solid line). The individual transition strengths are broadened by Gaussian functions with a full width at 1/e maximum of 0.5 eV and the spectrum is red-shifted by 0.5 eV. The experimental spectrum in water (gray solid line) is taken from Reference 166 optimization cycle). The heavy atom and C α -RMSD are 1.0 and 0.9 Å, respectively, compared to the experimental crystal structure.

| Periodic systems
Recently, the GFN0-and GFN1-xTB methods were extended to PBCs in two program packages 170,171 enabling the routine computation of solids and surfaces. As examples for application to periodic systems, we describe the computation of lattice energies for various molecular crystals in the common X23 benchmark set, 172,173 as well as structural data of 222 zeolite frameworks taken from a standard database. 174,175 Molecular crystals are an important research area for material science as well as in pharmaceutical chemistry. 176,177 Their perhaps most important property is the lattice energy, E lat , which reflects how much energy is released per molecule upon sublimation. It is defined as where E crystal is the total energy of the crystal including overall n molecules within the primitive cell and E gas is the energy of the isolated molecule in its lowest energy conformation. A commonly used benchmark set for lattice energies, which comprises a diverse set of experimentally well-determined organic molecular crystals, is the X23 dataset. 173 Reference values are experimental sublimation enthalpies, which have been back-corrected for vibrational contributions in the harmonic approximation. It includes various intermolecular binding motifs as, for example, hydrogen bonding, electrostatic interactions, as well as London-dispersion dominated unsaturated hydrocarbons and is therefore ideally suited to assess how well the underlying theoretical method is able to describe NCIs in a dense environment. The error of the applied Γ-point only approximation applied in the periodic GFN1-xTB method can become quite large for small cells. Therefore, we constructed different supercells (2 × 2 × 2 and 3 × 3 × 3) for every molecular crystal F I G U R E 2 0 Cartoon representation of the GFN-FF optimized (blue) structure of hemocyanin (PDB: 1JS8), including an explicit water solvent shell of 6 Å (red) to minimize this error and noted that the best results are obtained within the largest supercells which are reported here. Furthermore, we employ the low-cost DFT composite HSE-3c method 178 for comparison. The average experimental value of E lat over the whole test set is 20.3 kcal/mol.
As can be seen in Table 4, the overall best performance is obtained by HSE-3c with a mean absolute deviation of 1.3 kcal/ mol. GFN1-xTB doubles the MAD of HSE-3c but outperforms its direct competitor DFTB3-(3ob)-D3(BJ) by more than 0.5 kcal/mol. Since the GFN1-xTB method is orders of magnitudes faster than HSE-3c, this result is especially promising for future screening studies, for example, in crystal structure prediction. Further enhancements have to be developed, especially the proper inclusion of a k-point sampling is necessary for smaller cell sizes. Nevertheless, the applicability of the GFN1-xTB method is superior to the DFTB methods enabling the calculation of systems with almost every element combination.
With this in mind, we discuss how cell volumes of small-to large-sized zeolite frameworks are reproduced in comparison to higher-level theoretical reference data. Overall, 222 zeolite frameworks were considered, 174,175 for which the number of atoms inside the primitive cells range from 15 atoms (IZA code: EDI) to the biggest with 4,320 atoms (IZA code: MWF).
All structures have been optimized by Baerlocher and co-workers using the distance least squares (DLS-76) level of theory, 180 which we use for comparison. Due to the large size of some systems, we apply the cost-effective periodic GFN0-xTB method as implemented under PBC in the xtb program. Note that the DLS-76 method has been fitted to reproduce experimental data thereby incorporating thermal effects implicitly. Figure 21 depicts a correlation plot between GFN0-xTB-computed and DLS-76 reference volumes, which shows that GFN0-xTB is able to accurately reproduce most of the DLS-76 structures rather accurately. Overall, the F I G U R E 2 1 Comparison between GFN0-xTB and DLS-76 volumes for 222 experimentally known zeolite framework structures. The inset shows a structure overlay of the only significant outlier (see text) GFN0-xTB method slightly underestimates the cell volumes, which can be explained by missing zero-point vibrational and thermal contributions. For molecular crystals this contribution amounts to approximately 2%. 181 The reference cell angles are mostly conserved in the optimizations. The MAD values of cell lengths, angles, and volumes for GFN0-xTB with respect to the DLS-76 structures are given in Figure 21, where one significant outlier (PAR) is shown as inset (cell volume decreased by approximately 14%).

| CONCLUSIONS
In this review, the theory, development, and implementation as well as prototypical applications of the GFN family of atomistic, mostly QC-based semiempirical methods is described. Their main purpose is the fast, robust, and reasonably accurate calculation of large molecules in gas and condensed phase with an emphasis on a good description of ground state structures, NCIs, low-energy chemical transformations like molecular conformations and vibrational frequencies.
A common feature and strong point consistent among the methods (including the recently developed GFN-FF universal force field) is that practically any chemically interesting species can be treated, since parameters exist for a very large part of the periodic table (up to radon). Opposed to the only existing competing family of semiempirical methods with broad parameterization (PMx), the theory is derived from a DFT background, which makes the methods applicable also in electronically more complicated cases like transition metal compounds or other systems involving stronger electron correlation effects. Although we currently have less experience with the methods applied under PBCs, it is indicated that due to their consistent design strategy and physically very reasonable energy terms, they should be also applicable to solids, surfaces or dense materials in general. The theoretically weakest point of the xTB methods in their current formulation (but also an essential reason for their high efficiency) is the use of an (almost) minimal AO basis set for the expansion of the KS-type electronic eigenvalue equations. Overall, the xTB methods, especially the most sophisticated GFN2-xTB version, provide an exceptionally good accuracy-performance/computational cost ratio for the target properties. On standard workstations or even laptop computers, full geometry optimizations, frequency calculations, or conformational searches can be conducted in minutes to a few hours for systems composed of hundreds to about a 1,000 atoms. Although the xTB methods can be applied in principle also to much larger systems composed of about 5,000-10,000 atoms, in this regime force-fields like GFN-FF are more appropriate. Alternatively, they allow fast and robust screening of compound libraries in about the same computation time, for example, thousands of candidate species with 50-100 atoms. This applies even to other interesting chemical (off-target) properties not discussed here like, for example, polarity (dipole moments), gaps (electrochemistry), vibrational anharmonicity or bonding (electron density) information. Calculations can be carried out in a user-friendly black-box style as implemented in the efficient xtb program, which is freely available 47 and accompanied by a continuously updated online documentation. 98 Not part of the GFN family is another xTB-based scheme, which is discussed in this review: in the sTDA-xTB method, a non-self-consistent xTB approach is combined with a subsequent simplified Tamm-Dancoff-approximated time-dependent linear response treatment. This is a single-point method, which is suited for the computation of excitation energies and optical spectra of huge systems, such as full proteins without resorting to a QM/MM or fragment approach.
We are confident that the described methods will form the basis for many successful studies in various fields of computational science. Application in automated workflows may open up new perspectives for molecular property design or chemical reaction space exploration. Combination with modern machine learning techniques, for example, as input or data generator, also seems to be a promising research direction and first attempts based on the xTB methods have already been made. 182