## Introduction

*Ab initio* and density functional theory (DFT) calculations typically require two main approximations, the “method” (the expansion of the many-electron wavefunction or the choice of exchange-correlation functional) and the “basis set” (the expansion of the one-electron orbitals). The choices made in these basic considerations for solving the electronic Schrödinger equation will determine the overall accuracy of the results and the associated computational cost required to obtain them. Although knowledge of basis set development is not a prerequisite for performing quantum chemical calculations, especially as many basis sets are included in widely available program packages or basis set archives,1–3 the aim of this review is to aid the theoretical/computational chemist in making informed basis set choices for molecular calculations, focusing on insights into the design, development, and optimization of the more recent developments in the field. It is the belief of the author that this will allow a feeling for the basis set incompleteness error in a given calculation to be developed, or “the right results for the right reason” to be obtained.

Detailed descriptions of the functions found within basis sets and their classifications can be found in a number of texts (e.g., Ref.4), hence the following description is kept purposefully brief to serve as a reminder of terminology. This review is concerned only with Gaussian basis sets,5, 6 where the basis set is comprised of basis functions taking the form:

or an analogous representation using Cartesian, rather than polar, coordinates. For the Gaussian-type orbital (GTO) shown in Eq. (1), is a normalization constant, ζ is known as the exponent, and *Y*_{l,m} are spherical harmonic functions. The indices *n*, *m,* and *l* determine the type of orbital (*s*, *p*, *d*, etc.). Basis functions of this explicit form are also known as Gaussian primitives (the Gaussian prefix shall be assumed herein) and, for reasons of balance and efficiency, a linear combination of a number of primitives is often undertaken to produce contracted functions:

where *n*_{i} is known as a contraction coefficient.7 The methods and philosophies of optimizing the exponents and contraction coefficients, and their subsequent collection into a single defined whole (the basis “set”) will be the focus of this article.

How the exponents of a basis set are optimized has a large dependence on its intended usage. For example, if an investigation is focused solely on the electric properties of molecules, then using a basis set that has been optimized specifically for that purpose is likely to produce very good results.8 However, attempting to address all basis sets specialized for every different type of physicochemical property would represent a huge undertaking, hence this review will be mainly concerned with atom-centered, “energy-optimized” basis sets, where the exponents are optimized to minimize an electronic energy. Such basis sets are considered to be more general-purpose, yet are often augmented with additional functions when considering specific problems (see later). Although all exponents may be optimized using common minimization algorithms such as Broyden-Fletcher-Goldfarb-Shanno (BFGS) and simplex,9 an alternative option is to optimize a single exponent, then produce additional exponents using mathematical relationships such as “well-tempered,” “even-tempered,” and “Legendre polynomials.” Such methods are incredibly useful for large basis sets where optimization of every individual primitive is difficult, and have been discussed by Petersson et al.10 Even-tempered schemes also find use in extending basis sets with additional diffuse functions for the description of, for example, molecular anions. Such extensions often take the form:

where ζ_{3} is the new exponent, ζ_{2} is the next most diffuse exponent, and so forth.

### Quality and contraction pattern

Working within the assumed framework of atom-centered, energy-optimized GTOs, basis sets can be further classified based on the number and type of functions included. The simplest basis sets are termed “minimal basis sets,” as they possess only the functions necessary to contain the electrons within an atom. For example, a minimal basis set for a neutral Ne atom would contain two *s* functions and a single three-component *p* function. In practice, minimal basis sets are rarely used due to the fact that they allow for little or no electron correlation.

One method of improving on a minimal basis set is to simply double the number of functions, leading to four *s* functions and two (three-component) *p* functions in our Ne example. In a similar fashion, one could triple, quadruple, and so forth, the number of functions, and such basis sets are known as double zeta (DZ), triple zeta (TZ), quadruple zeta (QZ), respectively. Most commonly used basis sets use a slightly modified arrangement where only the basis functions corresponding to the valence electrons of a given element are increased in zeta, leading to “split-valence” basis sets. Such basis sets may still be referred to as, for example, DZ, but the abbreviations VDZ and DZV (valence double zeta and double zeta valence, respectively) are also in use and provide additional clarity. Beyond increasing the zeta of a given basis set, its quality may also be improved by adding additional higher angular momentum functions beyond those required to contain the neutral atom electrons, for example, adding *d*-type functions (or higher) to a basis set for Ne. Functions of this type are termed “polarization functions” or “correlation functions” and interested readers are referred to Ref.4 for a more detailed description of the physical interpretation of the effects of adding such functions to a basis set. Suffice it to say that they are incredibly important for investigations using electron correlation.

Although the contraction of basis functions according to Eq. (2) is a simple linear combination, the design decisions undertaken when developing new basis sets has led to the emergence of two distinct methods of selecting which GTOs are contracted and the pattern for doing so. The first of these is the “segmented contraction,” where each primitive contributes to only one contraction. This can lead to the situation where several primitives are cloned to appear in a number of different contractions. The second method is known as “general contraction,”11 and in this case all primitives of the same angular momentum are present in every contracted function of the same angular momentum. A given primitive will then have a different contraction coefficient in each contracted function. For either type of contraction, it is also common to include a number of primitives in their uncontracted form (essentially a single primitive with a contraction coefficient of 1). When listing a basis set composition, both types of contraction are expressed in the style (12*s*6*p*3*d*2*f*1*g*) → [5*s*4*p*3*d*2*f*1*g*], where the parenthesis denotes the primitives and the square brackets the contracted functions. It is important to note that this provides almost zero information about the contraction pattern, and to uncover the details one must usually refer to some list of exponents and contraction coefficients or the original publication. Although most modern quantum chemistry codes support both segmented and generally contracted basis sets, the algorithms used are usually significantly more efficient for one contraction scheme than the other. A user should use such information when deciding on which basis set or software package to use for a given investigation.

With the terminology of basis sets in place, the development of several families of modern basis sets can now be discussed. For earlier developments and further history of Gaussian basis sets, there are a number of well-written reviews, such as Ref.12.