## Introduction

In analogy to the vastness and sparseness of outer space, we can loosely refer to the space of chemical systems as chemical compound space (CCS), some continuous observable space that is populated by all experimentally and theoretically possible chemicals with natural nuclear charges and real interatomic distances for which chemical interactions occur.1 Stated more precisely, CCS refers to the combinatorial set of all compounds that can be isolated and constructed from possible combinations and configurations of atoms and electrons in real space. In absence of external fields and for given and atom-types and spatial configurations , not only covalent, ionic, and metallic bonding result, but also the much weaker hydrogen and van-der-Waals (vdW)-bonding, responsible for the physics and chemistry of molecular crystals, liquids, and other supramolecular aggregates. Most research efforts in this first principles context are concerned with approximations and methods necessary for making property predictions for given compounds. By contrast, the focus of this tutorial is a first principles view on the compounds *per se*.

Notwithstanding chemical bonding or conformations and merely considering the number of possible stoichiometries, it is obvious that the size of CCS is unfathomably large for all but the smallest systems. Due to all the possible combinations of assembling many and various atoms its size scales exponentially with compound size as . Here is the number of possible atom types, that is, the maximal permissible nuclear charge in Mendeleev's table ( ), and depends on the employed definition of “isolated system” but can certainly reach Avogardro's number scale for living organisms, chunks of unordered matter, or planets. Although many of such speculative compounds are likely to be unstable, the state of affairs worsens dramatically when accounting for the additional degrees of freedom which arise from distinguishable geometries due to differences in atom bonding or conformations. This combinatorial explosion with system size is the main motivation for advocating an *ab initio*, or first principles, view on CCS, namely a view that restricts us to use solely and as input variables* and, while maybe not free of empirical parameters, will not change in its parameterization as and are freely varied.2 A major part of modern electronic structure theory and interatomic potential work is concerned with the development of improved methods and approximations for solving Schrödinger's equation (SE) within the Born–Oppenheimer approximation for systems relevant to materials, biological, or chemical research, and deriving properties thereof.3 *Ab initio* statistical mechanics efforts are dedicated to sampling the corresponding degrees of freedom from first principles.4 In the context of CCS, the electronic Hamiltonian *H* for solving SE, , of any compound with a given charge, , is uniquely determined by its (unperturbed) external potential, , that is, by its set . Here, is the total number of protons in the system, the sum over all nuclear charges. Due to the Hohenberg–Kohn theorem, we also know that the electron density , and all electronic properties derived thereof, are determined by , up to a constant, .5 Consequently, we can work directly with .

In this tutorial, CCS is first briefly illustrated in terms of a rough energy scale in section Energy Hierarchy. In section Molecular Grand-Canonical Ensemble, we will review the notion of a molecular grand-canonical ensemble density functional theory (DFT) that accounts for fractional electrons and nuclear charges. Section Compound Pairs will deal with pairs of chemical compounds, and with efforts to exploit the arbitrariness of interpolating functions. It also details the challenge associated to a prize award of one ounce of gold. Finally, we will discuss in section Statistical Methods recent efforts to use intelligent data analysis methods [machine learning (ML)] to systematically infer analytical structure property relationships from previously calculated electronic structure data sets.