Quantum mechanical ab initio simulation is rapidly gaining a more important role in many scientific communities due to decreasing computational cost, as well as the availability of computer programs of increasing capability and ease of use. In particular, the number of computer codes devoted to periodic systems has been rapidly growing. Crystal88 was the first such code to be distributed publicly 25 years ago through the Quantum Chemistry Program Exchange.[1, 2] Since then six other releases have followed in 1992, 1995, 1998, 2003, 2006, 2009, and now there is the current one, Crystal14. This computer program can be used to study the properties of many types of compounds characterized by periodicity in one dimension (quasilinear and helical polymers, nanotubes), two dimensions (monolayers, slabs), or three dimensions (crystals, solid solutions, substitutionally disordered systems). As a limiting case, molecules can also be studied.
Despite the many improvements and generalizations that have been introduced since its first release, the basic aspects of Crystal have remained the same. Thus, this program computes the electronic structure of periodic systems within the Hartree–Fock (originally) and density functional theory (DFT) single particle models using Bloch functions (BFs). A special feature of the code is that the crystal orbitals are expanded as linear combinations of atom-centered Gaussian-type functions. Powerful screening techniques are used to exploit real space locality, which is another distinguishing characteristic of Crystal. Restricted (closed shell) and unrestricted (spin-polarized) calculations can be performed with all-electron or with valence-only basis sets using effective core pseudopotentials.
Another unique feature is the extensive exploitation of symmetry to achieve computational efficiency. Besides the 230 space groups, 80 two-sided plane groups, 99 rod groups, and 32 crystallographic point groups, there is provision for molecular point group symmetry (e.g., icosahedral) as well as helical symmetry. Automatic tools allow users to obtain lower dimensionality systems from three-dimensional (3D) structures by specification of a few geometrical parameters. Slabs (two-dimensional (2D) periodic), nanorods (one-dimensional (1D) periodic), and nanocrystals [zero-dimensional (0D)] are easily generated from 3D crystalline structures; nanotubes (1D) and fullerenes (0D) can be constructed from 2D sheets or multilayered slabs (Low-Dimensionality Systems Section). Full use of symmetry involves all steps of a Crystal calculation, leading to drastically reduced computation time and required memory, as well as improved task farming in parallel calculations (Use of Symmetry in Crystal Section).[3-6] Symmetry is also used to select the independent elements of tensor properties for computation.
Several algorithms of Crystal14 now rely on the numerical accuracy of the geometry optimizer: search for equilibrium structures and transition states, volume- or pressure-constrained minimizations for the determination of the equation of state of bulk crystals, nuclear relaxation of strained lattices for the computation of elastic, piezoelectric, photoelastic tensors and so forth. Recent developments of the algorithms of the geometry optimizer are illustrated in Geometry Optimization of Periodic Systems Section.
A wide variety of crystal properties can now be computed automatically. They include third and fourth rank, as well as first and second rank tensors. Among the former are the fourth rank elastic tensor along with the related seismic wave velocities;[7-10] the third rank direct and converse piezoelectric tensors,[11] and the fourth rank photoelastic Pockels' tensor (Elastic, Piezoelectric, and Photoelastic Properties of Crystals Section).[12] In addition to the dielectric (or polarizability) tensor (static and frequency-dependent), the second- and third-order electric susceptibilities (or third- and fourth-order hyperpolarizabilities) are computed analytically via the coupled perturbed Hartree–Fock/Kohn–Sham (CPHF/KS) method (Static Nonlinear Polarizabilities and Frequency-Dependent Electronic Dielectric Constant Section).[13-18]
Previous versions of the program, starting with Crystal03,[19, 20] have included efficient computation of vibration frequencies (Hessian matrix) and related properties. Now, we have added analytical calculation of infrared (IR) and Raman intensities through the CPHF/KS scheme[21-24] and automated computation of IR and Raman spectra.[25-27] These vibrational properties are listed in Table 1, together with those discussed in the previous paragraph, as the main tensor properties available in Crystal14. In addition, the program now contains improved algorithms for calculating phonon dispersion and anisotropic displacement parameters (ADPs)[28, 29] (see Vibrations in Solids: Analytical IR and Raman Intensities, Vibrational Spectra, and Phonon Dispersion Section).
Table 1. Physical properties that can be computed with Crystal14.
Property
Tensor rank
Formula
Definitions
Method
For each property, its formula and tensor rank are given along with the general method of computation, which may be either analytical (A) or involve a mixed analytical/numerical (A/N) scheme. Here, t,u,v,w = x,y,z represent Cartesian indices.
Atomic gradient
1
gta=∂E∂ra,t
Energy E and atomic position vector r_{a}, of atom a
A
Cell gradient
2
G_{tu}=∂E∂Atu
Energy E and lattice metric matrix A = (a_{1}, a_{2}, a_{3})
A
Polarizability
2
αtu=∂2E∂εt∂εu
Energy E and electric field ε
A
Born charge
2
Ztua*=∂2E∂εt∂ra,u
Energy E, electric field ε, and atomic position vector r_{a}
A
Electric field gradient
2
qtu=∂εt∂ru
Electric field ε and position vector r
A
Hessian
2
Htuab=∂2E∂ra,t∂rb,u
Energy E and atomic position vector r_{a}
A/N
Direct piezoelectricity
3
etuv=∂Pt∂ηuv
Polarization P and rank-2 strain tensor η
A/N
Converse piezoelectricity
3
dtuv=∂ηuv∂εt
Electric field ε and rank-2 strain tensor η
A/N
First hyperpolarizability
3
βtuv=∂3E∂εt∂εu∂εv
Energy E and electric field ε
A
Raman intensity
3
Ituva=∂3E∂εtεu∂ra,v
Energy E, electric field ε, and atomic position vector r_{a}
A
Elasticity
4
Ctuvw=∂2E∂ηtu∂ηvw
Energy E and rank-2 strain tensor η
A/N
Photoelasticity
4
ptuvw=∂Δεtu−1∂ηvw
Rank-2 dielectric tensor ε and rank-2 strain tensor η
A/N
Second hyperpolarizability
4
γtuvw=∂4E∂εt∂εu∂εv∂εw
Energy E and electric field ε
A
New algorithms have recently been developed for the study of solid solutions and, more generally, disordered systems.[30, 31] As far as solid-state solutions are concerned, for any substitution fraction x within a given series, the program finds the total number of atomic configurations and determines the symmetry-irreducible configurations among them. Symmetry irreducible configurations can, then, be explored (i.e., optimized) automatically. Tools for Studying Solid Solutions and Disordered Systems Section is devoted to the presentation of these new tools.
We have also developed new tools for the analysis of electron charge and momentum densities. Electron Density Analysis Section deals with (i) topological analysis of the electron charge density, performed according to Bader's prescriptions using Gatti's Topond package,[32] that is now integrated with the Crystal14 program; (ii) computation of ADPs and Debye–Waller (DW) thermal damping for X-ray structure factors (XSF);[28] and (iii) new algorithms for analyzing the electron momentum density (EMD).[33, 34]
For DFT calculations, the local-density and generalized-gradient approximations (LDA and GGA) already available in previous versions of the program have been augmented with exchange-correlation functionals corresponding to the third, fourth and fifth rung of “Jacob's ladder”. The semilocal and hybrid meta-GGA functionals, range-separated hybrids, and double hybrids (DHs) that have been implemented are discussed in Climbing the Jacob's Ladder for Solids Section.
Crystal may be run either sequentially on a single processor or in parallel. Parallel processing, in turn, can be done either through a replicated data procedure (Pcrystal), wherein all the most relevant quantities are copied on each node, or according to a MPP (massively parallel) strategy, in which the large matrices are partitioned and distributed among the cores. The MPPcrystal version of the program, first released in 2010, is advantageous for systems with a large unit cell and low symmetry. Since then, we have improved performance so that calculations now scale linearly up to thousands of cores. These recent advances are described in Crystal14 Performance Section.
The article is organized as follows: each section corresponds to a particular capability of the Crystal14 program; the newly developed features are illustrated from a general point of view and a few examples are given of their application to systems of interest. Technical details about how to run specific calculations and extract the corresponding information (input/output structure) can be found in the CRYSTAL14 User's Manual[35] and in the many tutorials available at the Crystal website (Tutorials on how to run specific types of calculations with Crystal can be found in Ref. [36]).
Use of Symmetry in Crystal
Symmetry plays a crucial role in the study of crystalline compounds and its use in simulation can be enormously advantageous because: (i) it greatly simplifies the set of data to be given as input; (ii) it improves performance dramatically; (iii) the amount of memory storage is reduced significantly; and (iv) it aids in comparing with experiment.
The data needed to define a crystalline structure is greatly reduced if its space group is known because the unit cell can be generated automatically from knowledge of the asymmetric unit only. The same rule applies to slabs, rods, nanotubes, and molecules (including fullerenes). Two significant examples are nanotubes and fullerenes, as obtained by geometrical construction from graphene (see Low-dimensionality systems Section for further details and examples). The same kind of simplification applies to many structure manipulations that modify either translational or point symmetry, or both. For instance, a 2D graphene monolayer can be automatically cut out of a 3D graphite crystal simply by using the keyword SLABCUT (see later).
Low-dimensionality systems
Twenty or thirty years ago, theoretical studies could be classified into three main categories: standard molecular calculations, fully periodic (bulk) calculations, and cluster calculations. The first and last kind of calculations were performed with “molecular” codes, the second kind with “solid-state” codes, the two using completely different tools and technologies: plane waves versus atomic orbitals (AO), DFT versus Hartree–Fock (HF), all-electron versus pseudopotential basis set. Crystal, on the contrary, maintains full compatibility along the 3D → 2D → 1D → 0D periodicity series. At the same time, various tools have been implemented for an automated, consistent, easy construction of low-dimensional systems, such as:
Slabs (3D → 2D)
This option allows for cutting a 2D periodic slab out of a 3D bulk structure, like a graphene monolayer out of the 3D graphite crystal simply by specifying: (i) the crystallographic Miller's indices of the cutting plane (0 0 1 in this case); (ii) the label of the layer corresponding to the bottom cut (any layer in graphite); (iii) the number of layers forming the slab (1 in the case of graphene). The origin of the 2D unit cell is set automatically so as to maximize symmetry. In principle, this option allows for any termination of the surface, any stoichiometry, and slab polarity because intrinsically “polar” surfaces exist, where the electrical dipole perpendicular to the surface can only be removed by altering the surface structure, for instance, by breaking stoichiometry.
Nanotubes (2D → 1D)
Nanotubes (see Fig. 1a) of any size and order of symmetry can be built automatically by rolling up a 2D slab.[37] That is done by specifying a pair of integers to define a roll-up vector in terms of the slab unit vector components. The rolling vector is perpendicular to the nanotube axis and its modulus is the nanotube circumference. To build the (120,0) carbon nanotube (CNT) from a graphene sheet, for instance, the value of those two integers is: 120 and 0.
Fullerenes (2D → 0D)
A fullerene cage can be obtained starting from any hexagonal sheet; available shapes are: icosahedron, octahedron, and tetrahedron (all consisting of equilateral triangular faces). The required information includes a pair of integers representing the components of 2D unit cell vectors defining the side of the triangular face, the fullerene point group, and polyhedron type. For example, a (10,10) icosahedral fullerene (see Fig. 1b) is obtained from a graphene sheet by specifying the following data: 10, 10, IH, ICOSA (standing for icosahedron).
Nanorods (3D → 1D)
To build “crystals in 1D” starting from the corresponding 3D bulk structure. Nanorods can be used as models for crystal edges (relevant to catalysis) and real nanoobjects. The required input steps to build a nanorod are similar to those used to create slabs: information about the crystal plane, termination, and thickness of the two crystal planes defining it. At this stage, a nanorod can exhibit sharp and unphysical edges that can be smoothed out by further cutting the rod along additional crystal planes parallel to the periodic direction. The origin of the nanorod cell is shifted automatically to maximize symmetry of the rod group. See Figure 2a for an example.
Nanocrystals (3D → 0D)
Nonperiodic nanosystems are defined as nanocrystals, provided they preserve a crystalline structure. Very small nanosystems loosing the parent crystal structure and stoichiometry are commonly referred to as nanoclusters. A new feature in Crystal14 allows users to create nanocrystals from bulk in a controlled way by generating a supercell with faces parallel (2 by 2) to a set of three crystal planes (see Fig. 2b for an example). The same set of information used to create slabs (and nanorods) needs to be replicated three times in this case. As for nanorods, a nanocrystal can be smoothed by cutting edges along other crystal planes, so breaking the original chemical stoichiometry. For this reason, chemical composition is calculated and printed at each step of the procedure. The origin of the nanocrystal is shifted automatically to maximize point symmetry.
Wulff polyhedron Construction (2D → 0D)
After a systematic study of a selected set of crystal surfaces, where a set of well converged surface formation energies is obtained, the thermodynamic equilibrium crystal shape can be calculated using the Gibbs–Wulff theory to build a Wulff polyhedron.
Symmetry and efficiency
Use of translation invariance is mandatory to transform a cluster calculation for a piece of a bulk system into a periodic bulk calculation. BFs are used, in that regard, to transform an infinite Hamiltonian matrix into block-diagonal form. Each block corresponds to a k point in the first Brillouin Zone (BZ) (see the first two panels in Fig. 3). All periodic codes are based on the use of BFs.
Conversely, use of point symmetry is not essential to make a periodic calculation affordable and, indeed, most periodic codes neglect it. However, point symmetry can be very advantageous in periodic calculations (much more than in molecular codes). It is exhibited by most crystalline systems and can be invoked at several steps of an ab initio calculation, thereby drastically reducing computation time and resources. The key steps where savings occurs are given in the following for the case where the BFs are constructed from a local basis:
Diagonalization of the Fock matrix is restricted to the subset of k points in the irreducible BZ. The eigenvalues in a star of k points (a set of points that are symmetry related) are the same and the eigenvectors can be generated by symmetry operations. This kind of symmetry is used in Crystal since its first release (Crystal88).
Time required for the calculation of one- and two-electron integrals is reduced by a factor of up to the number of point symmetry operators in the group. Again, such use of symmetry dates back to Crystal 88.[3]
Diagonalization of the Fock matrix can be speeded up dramatically in the basis of the Symmetry Adapted Crystalline Orbitals (SACO)/Symmetry Adapted Molecular Orbitals (SAMO), shown in the last panel of Fig. 3. SACO/SAMO are generated automatically in Crystal from the selected basis set of AO in the unit cell, with no need for additional information about irreducible representations (irrep) or characters. This part of the code was implemented about 15 years ago.[4, 5] The saving factor in computation time is roughly proportional to the third power of the ratio between the number of AOs in the basis set (N_{AO}) and the size of the largest block in the Fock matrix, when represented in the SACO/SAMO basis.
Construction of the density matrix (DM) scales with the third power of the basis set size ( NAO3), as each of the NAO2 matrix elements is obtained by summing over all occupied crystalline orbitals (very roughly N_{AO}). The advantage is obtained by building the matrix in the SACO/SAMO basis first.
Reduction of computing time thanks to the use of symmetry is not the only issue when handling very large unit cell cases. Memory requirements can also become a bottleneck, if not properly managed at every step of a calculation. Storage of the Fock, overlap, and density matrices as full square matrices in the AO-BF basis represent the main bottleneck and need to be avoided. As both one- and two-electron integrals are evaluated in the AO basis, a set of back and forth transformations (SACO to AO and AO to SACO) are required. However, it is possible to switch directly from the AO to the SACO/SAMO basis and use the latter throughout. This has been implemented in Crystal14 with a drastic reduction of both running time and memory allocation.[6] The larger the point group size, the bigger the reduction of computational resources (see Crystal14 Performance Section for details).
In the following, we document the savings from full use of symmetry in the case of CNTs, in particular for the (n,0) family. The number of symmetry operators in CNTs increases with the tube size, that is, with n. Thus, in principle, there is no limit to the number of nonpurely translational elements. Neglecting mirror planes parallel and perpendicular to the tube axis, (n,0) nanotubes possess 2n roto-translation symmetry operators. The largest nanotube considered here, with n = 320, has 640 such operators and 1280 atoms in its unit cell. This is a huge number compared to a maximum of 48 point operators for standard cubic crystalline systems or 120 for fullerenes, in the molecular context. Thus, large nanotubes are expected to show the maximum symmetry saving factor in terms of computation time and memory allocation. They also represent a severe test for the flexibility of our code and its algorithms, as will be discussed later. A more detailed description of the effect of symmetry can be found in Ref. [6]. Large nanotubes are, clearly, an extreme case. Evidence of the efficiency of the code for systems with lower point symmetry (from 1 to 48 operators) is given in one of the next sections.
Tables 2 and 3 show the effect of full use of symmetry on peak memory usage and on running time. The various steps of the calculation can be divided into three units:
Preliminary calculations for symmetry analysis (including construction of the character table and SACOs), index mapping, and screening of integrals (init). This unit includes also the generation of a set of orthogonal functions through Cholesky decomposition.
All steps involved in the SCF cycle, which are iterated as many times as necessary to achieve convergence (15 iterations in the present case). In the “direct SCF” strategy this unit includes: calculation of one- and two-electron integrals, as well as integrals approximated by multipolar expansion of the electron charge density (pole); transformation of the Fock matrix from the AO to the SACO basis and its diagonalization (Fock) + (diag); construction of the DM in the AO basis (dens); and integration of the exchange-correlation density functional (dft).
Calculation of the total energy gradients with respect to the nuclear coordinates (grad) at the end of the SCF process.
Table 2. Peak memory request (in MByte) when one core is used in the SCF+Gradient calculation for a set of (n,0) carbon nanotubes.
(10,0)
(20,0)
(40,0)
(80,0)
(160,0)
(320,0)
The B3LYP hybrid functional is used with a basis set of 14 AOs per atom. Only roto-translational operators are included in the point group (mirror planes are neglected). For each nanotube, the number of atoms N_{at} and number of symmetry operators N_{op} is given. The various steps are as follows: initialization of the simulation, including construction of the symmetry group and transformation matrices (init); construction of the basis orthogonalization matrix (Cholesky); calculation of one-electron integrals, two-electron integrals, and integrals approximated by multipole expansion of the charge density (pole); transformation of the Fock matrix from the AO to the SACO basis (Fock) and its diagonalization (diag); construction and back transformation of the density matrix (dens); grid integration of the exchange-correlation density functional (dft); and calculation of total energy gradients with respect to nuclear coordinates (grad).
^{a}
^{*}Formulation of the Density Matrix directly in terms of SACOs will be available in a next release to be distributed in mid 2014. Data in parentheses refer to the release distributed since December 2013.
Table 3. CPU time (in seconds) for the various steps in the SCF+Grad calculation for (n,0) carbon nanotubes.
(10,0)
(20,0)
(40,0)
(80,0)
(160,0)
(320,0)
See Table 2 for description of individual steps. Timing for a single SCF cycle (TOT_{cyc}) and for the entire SCF procedure (TOT_{SCF}, 15 cycles in this case) are also reported.
^{a}
Formulation of the Density Matrix directly in terms of SACOs will be available in a next release to be distributed in mid 2014. Data in parentheses refer to the release distributed since December 2013.
Let us begin with the memory allocation requirements reported in Table 2. Clearly, all steps except init require an extremely small amount of memory. As an example, the 239 MBytes for the grad step in the case of the largest nanotube is about 1/10 of the memory commonly available in a single-core standard machine. The reason for such low memory requirements is that the number of atoms in the asymmetric unit is just 2 regardless of n. In Crystal14 only the irreducible part of the Fock, overlap and density matrices is stored in memory (we recall that in addition, screening reduces the total number of matrix elements dramatically). As a consequence, about 6000 elements are stored for each matrix, irrespectively of the size of the nanotube.
In the init step, the generation of the coefficients for SACOs has not yet been optimized from the viewpoint of memory allocation, as this is not an issue in all normal cases (any space group or icosahedral symmetry). However, because memory requirements increase quadratically with the number of operators, SACO generation becomes dominant at about 150–200 operators. In the case with the largest number of symmetry operators (640) considered here, more than 2 GBytes are required. That is why only roto-translation operators were taken into account for Table 2. The addition of mirror planes would quadruple the number of symmetry operators, and thereby increase memory requirements, with little gain in efficiency.
Let us now turn to the running time. Table 3 shows that the full self-consistent-field (SCF) procedure plus gradient calculation for the largest CNT (1280 atoms, 17,920 AO per unit cell) is accomplished within less than 5 h on a single processor for the B3LYP hybrid functional. No calculation for such a large CNT has been reported in the literature, even with smaller basis sets or simpler functionals such as LDA or GGA. The cost of the various steps is reported in Table 3 and represented pictorially in Figure 4; the (expectedly) most demanding steps along the series are two-electron integrals and calculation of the gradients (grad). Nevertheless, for the same reasons invoked for low memory requirement, computing time is remakably little and nearly independent of the nanotube size: quite interestingly, about 40% of computing time for the largest tube is spent in the init, Cholesky, and Fock steps, where higher efficiency could, in principle, be recovered.
Symmetry in anisotropic properties
A number of physical properties can be given a tensorial representation. For example, linear elasticity is described by a fourth-order tensor. Such a tensor consists of 21 independent components for a triclinic crystal whereas they reduce to three in the case of a cubic crystal. Thus, symmetry is a key to study tensorial properties: which components have to be computed? Which elements are null by symmetry? Which are symmetry-related? Moreover, experimental data are generally reported with some standard orientation of the cell parameters with respect to the Cartesian frame. For comparison between calculated and experimental data to be consistent, the orientation issue needs to be clearly stated. Crystal performs such a symmetry analysis of the tensor using the TENSOR keyword. That is particularly useful in low-symmetric cases and in the case of three-fold rotation axes. Such analysis is automatically performed prior to the calculation of all tensorial properties available. A single keyword (ELASTCON for elastic constants, PIEZOCON for piezoelectric constants, PHOTOELA for photoelastic constants, see Elastic, Piezoelectric, and Photoelastic Properties of Crystals Section for details) is sufficient for generating the full tensor of interest.
Geometry Optimization of Periodic Systems
Geometry optimization is mostly used in quantum chemical calculations to obtain a nuclear configuration that is either
a stable structure of a given chemical species, or
can be used to estimate the transition configuration along a reaction path leading to determination of the rate for the corresponding process.
The former use is based on the Born–Oppenheimer approximation and thermostatistics, and is realized by computing the corresponding minimum of the potential energy hypersurface (PEH). As concerns the latter, an optimization toward an appropriate saddle point of the PEH is done and, then, thermostatistical considerations yield activation energies and reaction rates. These are the main reasons why geometry optimization is one of the most used options of any quantum chemical code. In what follows, the optimization features recently implemented in Crystal14, mostly dealing with minimization, will be briefly discussed.
The efficiency of minimization methods depends primarily on the extent to which the energy function or PEH can be represented by a pure quadratic power series expansion in terms of the geometric coordinates used. Most gradient techniques warrant that, for an ideal pure quadratic PEH, the minimum will be attained in less than M steps, where M is the number of independent coordinates.[40] Normally, such ideal behavior is not found and various techniques are used to ensure similar convergence behavior.
In the Crystal code, two different strategies are utilized to deal with a nonquadratic periodic PEH: the trust radius technique and the choice of suitable coordinates. The former has already been well documented,[41] and as no special features are used in the code implementation, it will not be considered further. Conversely, the coordinate system choices available in Crystal for periodic calculations are worthy of discussion.
Although the critical points of a PEH are invariant under coordinate system transformations, the quadratic character of the energy function may be drastically improved with a proper geometrical parameter choice. Crystal allows for the use of two coordinate systems that are particularly suitable for periodic calculations, namely symmetrized crystallographic (SC) and redundant valence (RV) coordinates. The former is the basic set for any crystalline system and consists of symmetry adapted fractional directions (SADIR) and elastic deformations (SAED). In Crystal14, both sets of geometrical parameters are normalized by default. This ensures equivalent weights in the optimization procedure, which is particularly relevant for the construction and use of the approximate Hessian matrix used in pseudo-Newton techniques.[40]
With regard to SADIRs, the translational degrees of freedom are first excluded. Then, the remaining directional coordinates are symmetry adapted and, subsequently orthonormalized using a Schmidt procedure. For SAEDs, the set is first symmetry adapted and Schmidt orthogonalized. Then, each deformation d is normalized so as to fulfill the condition
∑A=1N∑t=13[∑u=13εtu(d)XuA]2=1,(1)
where N is the number of atoms, XuA are the Cartesian coordinates of atom A, and εtu(d) is the dth SAED.
Although the SC coordinates are easily understood in terms of common crystallographic concepts, their main drawback is that the resulting energy function is frequently far from being pure quadratic. There is also, typically, a strong coupling between the atomic directional and elastic deformation degrees of freedom. This is particularly noticeable in covalent systems that exhibit strongly directed bonds. As previously proposed,[42, 43] for this kind of system the use of RV coordinates often improves the optimization process.
The generation of the RV sets and their implementation for geometry optimization of molecules[44, 45] and crystals[46, 47] has been well-described in the previous literature. Accordingly, we omit the details here and highlight just the key differences in Crystal from other implementations.
The atomic connectivity required to define the RV coordinates is set up according to Refs. [44, 45]. An additional feature in the present implementation is that symmetry equivalences within the RV set are recognized and only one representative of each class is kept in memory together with the corresponding multiplicity per unit cell, μi. A small displacement in the reference coordinate system, δx, can be transformed into the displacement δq in the RV basis set using
δq=Bδx,(2)
where B is the Wilson B-matrix whose elements are given by
Bij=μi∂qi/∂xj.
In Crystal, the reference coordinate system, x consists of SADIRs and SAEDs and the B-matrix is computed by numerical differentiation based on the central point approximation. The force in RV coordinates is determined from the force in the reference system according to
fq=B−fx,(3)
where fqi=−∂E/∂qi and B−=G−BT, the superscript “–” indicates the generalized inverse and G=BTB.
Although optimization is carried out in the redundant space, in some steps of the procedure vectors in spaces tangent to the PEH must be computed. This is done by using the projector P=G−G.[45] If constraints are required to keep some RV parameters fixed, then a new projector that eliminates redundancies as well, P′, is constructed as[44]
P′=P−PC(CPC)−CP,(4)
where C is the projector onto the constraints subspace, which is given in RV coordinates by
C≡Cij={1if i=j and i is constrained,0otherwise.(5)
Both gradient and Hessian have to be projected. With regard to the latter, the projected matrix H˜=P′HP′ is diagonalized and its generalized inverse computed according to
[H˜ij]−=∑kTik[hk]−1Tjk,(6)
where T_{ik} is an element of the eigenvector matrix and k runs over all eigenvalues of H˜ except those that correspond to the redundancies (i.e., for n redundancies the n eigenvalues of smallest magnitude). This procedure also corrects for small errors that derive from numerical calculation of the B-matrix.
Once the optimized displacements in the RV coordinate system have been found, they must be back-transformed to the reference system to obtain the resulting geometry. In Crystal14, this is performed in two different ways:
In an iterative manner similar to that proposed for molecular codes,[44] and
By a strategy that minimizes a function which measures the proximity of an RV set of coordinates to a real crystal geometry.
Option 1 has proved to be efficient for most molecular structures. However, it is not as safe when points in the RV are very far from the subspace that matches the true geometrical parameter space. This usually occurs when the connectivity that defines the RV coordinates exhibits a large number of nested loops, which is the case in most periodic systems, and the required change in geometry is large. In such cases, the iterative procedure of option 1 diverges and a very approximate solution of the back-transformation relation must be chosen to continue the optimization.[45] When such an approximate back-transformation is performed more than once during the optimization, the inaccuracy of the displacements usually makes the whole procedure inefficient and most of the advantages due to adopting RV coordinates is lost.
Option 2 is now implemented as an alternative in Crystal14. It allows a more accurate back-transformation, thereby providing a substantial reduction in the number of optimization steps. The overall concept is as follows. Let us consider a set of RV coordinates Q={qi} and a reference set, X={xi}, whose redundancy is, in general, less than the former. There is a correspondence such that every vector expressed in terms of X has an image in terms of Q. The converse is not, in general, true. The aim of the technique discussed here is to unambiguously assign a coordinate position in X given a point in Q. Thus, given a displacement in terms of RV coordinates one would like to find a displacement in the reference set that is closest to the former. The idea of “closest” can be quantified by an “error” function:
Φ(x)=∑jaj(qj−q¯j)2,(7)
where a_{j} are weight factors, q¯j is the target in RV coordinates and qj=qj(x1,x2,…,xN). The optimum point in X space is the one that minimizes Φ.
The gradient of Φ may be written as
∂Φ∂xk=2∑jaj(qj−q¯j)∂qj∂xk,(8)
with ∂qj/∂xk=Bjk. If we define the vector dj=2aj(qj−q¯j), then Eq. (8) becomes:
g=B·d,(9)
where gk=∂Φ/∂xk. The optimum point in X, corresponding to zero gradient, is obtained by the conjugate gradient method which, in the present implementation, utilizes the Polak–Ribiere search direction.[40, 48]
Elastic, Piezoelectric, and Photoelastic Properties of Crystals
Several strain-related tensor properties of crystalline materials can be computed with the Crystal14 program simply by inserting a single keyword in the input deck. These properties include elasticity, piezoelectricity, and photoelasticity. A general and robust algorithm has been devised which automatically handles any space group and fully exploits point symmetry. Thus, along with other general improvements of the code described in Crystal14 Performance Section, it is now possible to carry out a complete ab initio calculation of elastic properties for large unit cell systems, such as garnets with 80 atoms per cell (see Elasticity Section below for details).[8, 9]
The elements of the fourth-rank elastic tensor C are second energy density derivatives with respect to pairs of cell deformations (see Table 1). A fully automated algorithm, using analytical energy gradients,[7] for the calculation of C, which was already implemented in Crystal09 for 3D systems, has now been extended in a number of ways including the:[8-10] (i) generalization to 1D and 2D systems, (ii) calculation of directional seismic wave-velocities, (iii) calculation of polycrystalline isotropic aggregate elastic properties, and (iv) calculation of elastic properties under pressure. Some of these points are discussed in Elasticity Section.
The direct piezoelectric tensor e, whose elements are first derivatives of the polarization with respect to crystal strain, is third rank. In our fully automated implementation,[11] the polarization is computed via the Berry phase approach.[49] The converse piezoelectric tensor d, determined as the strain induced by an external electric field, is evaluated from e and C using d=eC−1. These calculations can be done for 1D, 2D, and 3D systems. A more detailed discussion of piezoelectricity will be given in Piezoelectricity Section along with a brief review of recent applications.
The elements of Pockels' photoelastic or elasto-optic, fourth-rank tensor P, are defined as first derivatives of the inverse of the dielectric tensor with respect to crystal strain. An automated scheme has been implemented,[12] which adopts the CPHF/KS method (see Static Nonlinear Polarizabilities and Frequency-Dependent Electronic Dielectric Constant Section) for computing the dielectric tensor of equilibrium and strained configurations. The explicit dependence of elasto-optic constants on the electric field frequency can be evaluated as well, thus allowing a direct comparison with experimental data. Some examples will be given in Photoelasticity Section.
Elasticity
Nanotubes and Monolayers
As noted above, several improvements have recently been made to the algorithms for computing elastic properties of crystals. One of them is the generalization to 1D and 2D systems.[10] In this work, we have investigated elastic and vibrational properties for several families of single-walled nanotubes and discussed how they are expected to converge to those of the corresponding monolayers. Zigzag (n,0) boron nitride (BN),[10] zigzag (n,0) beryllium oxide (BeO),[50] armchair (n,n) zinc oxide (ZnO),[51] and zigzag (n,0) magnesium oxide (MgO)[52] nanotubes have been studied, with tube radii up to 24, 27.1, 45.3, and 43.3 Å, respectively.
A technologically interesting and widely discussed feature of nanotubes is their response to uniaxial strain along the nanotube axis.[53, 54] In this regard reference is made to the Young's (elastic) modulus, which essentially coincides with the C_{11} elastic constant. A comparison with the C_{11} elastic constant of the corresponding monolayer, as the tube radius increases, is not straightforward. As discussed in Ref. [50], when a nanotube is stretched (compressed) in the axial direction, its radius decreases (increases) in order to minimize the total energy. For the monolayer, there is a corresponding deformation orthogonal to the applied strain, that is, Poisson's effect, which must be taken into account. The Poisson-corrected monolayer value, for comparison, turns out to be C11×(1−σ2), where σ is Poisson's ratio given by σ=−C12/C11 in terms of monolayer elastic constants. The damping factor (1−σ2) is almost negligible for graphene and h-BN (0.970 and 0.955) whereas it is 0.867 for h-BeO and 0.577 for h-ZnO. Clearly, this effect increases with the ionicity of the chemical bonds in the system.
The IR-active vibrations of the above nanotubes can be subdivided into two distinct groups, with frequencies ν that tend either to an optical frequency of the monolayer or to zero with increasing tube size n. Three modes decrease linearly to zero as 1/n. These are collective modes without a direct correspondence in the vibrational spectrum of the monolayer. One of them, for instance, is the ring breathing. We have demonstrated that, in the large radius limit, these modes can be related to elastic deformations of the monolayer.[10] By imposing equality between the elastic strain energy of the monolayer and the corresponding vibrational energy of the nanotube, the slope of frequency versus 1/n for the latter can be related to the elastic constants of the monolayer. For the ring breathing mode, this gives:
ν=C22(MA+MB)1n|a|,(10)
where M_{A} and M_{B} are the atomic masses of the two atoms of the system (boron and nitrogen, zinc and oxygen etc.) and a is the lattice parameter of the monolayer. By fitting the BN nanotube vibration frequency versus 1/n we obtain 2637 cm^{−1} for the lhs of Eq. (10), whereas the elastic constants of the monolayer give a value of 2662 cm^{−1}. For BeO, the same comparison yields 1913 cm^{−1} versus 1915 cm^{−1}, whereas for ZnO the two values are 672 and 668 cm^{−1}. Bearing in mind that the properties involved in the comparison (vibration frequencies of the nanotube and elastic constants of the monolayer) are computed quite differently, the agreement is remarkable and confirms the high accuracy of all the algorithms involved.
Seismic Wave Velocities and Elastic Anisotropy
The acoustic wave velocities of a crystal are related to the elastic constants by Christoffel's equation.[55, 56] In Crystal14, an automated procedure has been implemented for computing these wave velocities along any crystallographic direction. The three acoustic wave velocities, also referred to as seismic velocities, can be labeled as quasilongitudinal v_{p}, slow quasitransverse vs1, and fast quasitransverse vs2, depending on their polarization with respect to the propagation direction.[57]
Silicate garnets are among the most important rock-forming minerals; they represent the main constituents of the Earth's lower crust, upper mantle, and transition zone. These garnets are characterized by a cubic lattice with space group Ia3¯d and formula X_{3}Y_{2}(SiO_{4})_{3}, where the X site hosts divalent cations such as Ca^{2+}, Mg^{2+}, Fe^{2+}, and Mn^{2+} and the Y site is occupied by trivalent cations such as Al^{3+}, Fe^{3+}, and Cr^{3+}. The primitive cell contains four formula units (80 atoms) and the structure consists of alternating SiO_{4} tetrahedra and YO_{6} octahedra sharing corners to form a 3D network. In a recent work, the B3LYP elastic properties, including seismic wave velocities, were obtained for the six most abundant end-members of this family (pyrope, almandine, spessartine, grossular, uvarovite, and andradite).[8, 9]
In Figure 5, we compare our ab initio directional seismic wave velocities for andradite Ca_{3}Fe_{2}(SiO_{4})_{3}, in particular, with experimental values.[58] Seismic wave velocities are reported along an angle θ such that θ=0° corresponds to the crystallographic direction [110], θ=45° to the [111] direction, θ=90° to the [001] direction and so forth. The agreement is quite impressive: both the angular dependence and the oscillation amplitudes are very accurately reproduced.
A further elastic property of great interest is the so-called elastic wave anisotropy, as measured by the dimensionless parameter A that vanishes for an isotropic material. For cubic crystals, A is given by the following simple expression in terms of elastic constants: A=(2C44+C12)/(C11)−1. The elastic anisotropy of garnets is generally rather small, 0.6%, when compared to that of other rock-forming minerals such as olivine, 25%, spinel, 12%, muscovite, 58%, orthopyroxene, 16%, and so forth.[59] From the calculations of A, we can sort the six silicate garnet end-members according to their increasing elastic anisotropy: spessartine < pyrope < grossular < almandine < andradite ≪ uvarovite. Spessartine and pyrope show very low anisotropies (A = −0.025 and A = −0.031, respectively) whereas uvarovite is, by far, the most anisotropic with A = −0.159.
Piezoelectricity
Direct and converse piezoelectric tensors of 1D, 2D, and 3D periodic systems can now be computed automatically with Crystal14.
The piezoelectric response of nanotubes is of considerable interest. We have recently found that BeO nanotubes exhibit a longitudinal piezoelectric response that is about 25% larger than BN nanotubes of comparable radii.[50] This is noteworthy as the response of the latter is quite large among low-dimensional systems.[60] In comparing the piezoelectric constant e_{11} for nanotubes of increasing radii with that of the corresponding monolayer, Poisson's effect has, again, to be taken into account. As before (see Elasticity Section), the connection between vibration frequencies of the nanotubes and elasticity of the monolayer implies a relationship, now between the contribution of the nanotube collective modes to the polarizability on one hand and the monolayer piezoelectricity on the other. We have investigated this relationship for BN, BeO, and ZnO nanotubes.[10, 50, 51]
As regards 3D systems, standard piezoelectric ceramics, such as lead zirconate titanate (PZT)-based materials, are widely used in many technological applications.[61, 62] At cryogenic temperatures, however, their piezoelectric response is significantly reduced. Hence, they cannot be used as actuators for adaptive optics in space telescopes and low temperature capacitors. In 1997, Grupp and Goldman discovered a giant piezoelectric effect in strontium titanate SrTiO_{3} down to 1.6 K, where the only nonzero converse piezoelectric coefficient d31=16×10−10m/V was reported. This is comparable to the value for PZT at room temperature.[63] In two recent studies, we fully characterized direct and converse third-rank piezoelectric tensors of the low temperature phases of SrTiO_{3} and BaTiO_{3}.[11, 64]
Due to its peculiar piezoelectric properties, α-SiO_{2} is another material that is widely utilized in the electronics industry. Unfortunately, its suitability is reduced for applications requiring high thermal stability and high electromechanical coupling. These limitations are mainly due to the α-SiO_{2} to β-SiO_{2} phase transition, in which case the piezoelectric constant d_{11} vanishes and d_{14} remains the only nonzero component.[65] Among quartz homeotypes, GaAsO_{4} and α-GeO_{2} exhibit the highest electromechanical coupling coefficients resulting in an electrical to mechanical energy conversion efficiency of about 22%.[66] Furthermore, they show a very high degree of thermal stability as they do not undergo an α–β phase transition.[67] Using the new tools in Crystal14, we have recently studied the solid solution Si_{1−x}Ge_{x}O_{2} of α-quartz, where silicon atoms are progressively substituted with germanium atoms, as a function of x. A linear increase in the piezoelectric response with the substitutional fraction had been suggested experimentally for very small x.[68] Our exploration of the entire range of substitution revealed a quasilinear increase throughout, a result that provides the basis for an effective tuning of the piezoelectric response.[69]
Photoelasticity
The variation of the refractive index (dielectric constant) with respect to internal or applied strain gives rise to the so-called photoelastic or elasto-optic, tensor. We have recently implemented a general algorithm for the ab initio calculation of this fourth-rank tensor in the Crystal14 version of our program.[12] Using this code, we computed the elasto-optic constants for a set of eight crystalline systems of different symmetry: simple cubic sodium chloride NaCl, lithium fluoride LiF, magnesium oxide MgO, and potassium chloride KCl; cubic silicon and diamond; trigonal SiO_{2} α-quartz and tetragonal TiO_{2} rutile. Good overall agreement with experiment was obtained, particularly so for the Perdew-Burke-Ernzerhof (PBE) generalized-gradient functional.
Experimental results for the elasto-optic constants can vary considerably from one measurement to another. A case in point is MgO, for which there are three independent constants, namely p11,p12, and p_{44}. The individual values of p_{11} and p_{12} deduced by several workers are very different from one another: p_{11} ranges from −0.21 to −0.31 and, for p_{12}, even the sign is uncertain, with values ranging from −0.08 to 0.04.[70-72]
Although the experiments are supposed to measure the variation of the static dielectric tensor (i.e., at infinite electric field wavelength, λ=∞), they are actually performed at finite wavelengths that may give results far from the static limit. The experimental values reported above were obtained in the wavelength range from 540 to 589.3 nm. In Figure 6, we show the three independent elasto-optic constants of MgO, computed at the PBE level, as a function of λ (see Electric field frequency dependence Section for more details about the λ-dependent CPHF/KS computational scheme). While p_{44} is almost wavelength-independent, p_{11} and p_{12} exhibit a clear dependence upon λ, slowly converging to the static limit above 1000 nm. In particular, the value of p_{12} is found to pass from negative to positive around 550 nm. Dashed vertical lines in the figure identify the experimental range of electric field wavelengths; both p_{11} and p_{12} are still changing in that range. This aspect is particularly crucial for p_{12} which changes sign in that range. This explains the uncertainty in its experimental value.
As a further application, we have recently characterized the photoelastic behavior of the low-temperature rhombohedral phase of BaTiO_{3}, again by explicitly treating the dependence on λ.[64]
Static Nonlinear Polarizabilities and Frequency-Dependent Electronic Dielectric Constant
A new feature in Crystal14 is the calculation of the electronic and vibrational contributions to the first and second static hyperpolarizability tensor for molecules, polymers, slabs, and crystals. The dependence of the electronic linear polarizability (or dielectric constant) on the frequency of the applied electric field has been added as well. The electronic properties are computed analytically, whereas the vibrational properties require a finite field geometry optimization.
Coupled perturbed HF/KS calculation of static electronic (hyper)polarizabilities
The total electronic energy E of a crystal in a uniform static electric field ε can be expressed as
with E(0) being the field-free electronic energy and μ,α,β,γ… the electronic energy derivatives of order 1,2,3,4… with respect to the Cartesian components of the electric field (indicated by the subscripts t,u,v,w). As for the corresponding physical properties, μ represents the dipole moment, α the polarizability, β the first hyperpolarizability, and γ the second hyperpolarizability. For a closed-shell system, the second energy derivatives αtu are calculated using the expression:
αtu=−2nk∑kBZℜ{Tr(Ck†Ωk,tCkUk,un)}(11)
where n_{k} is the number of k points in the first BZ and n is the diagonal occupation matrix whose elements are 2 for occupied orbitals and 0 otherwise. Here, Uk,u is an anti-Hermitian block off-diagonal matrix that relates the unperturbed coefficient matrix Ck, to the corresponding first-order perturbed matrix, Ck,u≡CkUk,u, which gives the linear (first-order) response to the electric field perturbation represented by the matrix Ωk,t. The off-diagonal blocks of Uk,u depend not only on Ωk,u but also on the first-order perturbed DM (through the two-electron terms in the Hamiltonian) which, in turn, depends upon Uk,u. Hence, a self-consistent solution of the CPHF equations is required.
The CPHF calculation of static linear polarizabilites was already available in a previous version of Crystal,[13, 14] for both closed and open shell systems, along with a corresponding CPKS treatment using pure and hybrid DFT functionals.[15] Now, the CPHF/KS method has been extended to the first and second hyperpolarizability tensors. As a consequence of the 2n + 1 rule the first-order coefficient matrix is sufficient to determine the first hyperpolarizability tensor. However, calculation of the second hyperpolarizability tensor γ requires the second-order coefficients and they must be obtained from a self-consistent solution of the second-order coupled perturbed equations. Both the β and γ tensors can be expressed in terms of Uk,uv:
The operator ℘ in Eqs. (12) and (13) carries out the sum over all permutations of the Cartesian directions and Ek,vw is the second derivative of the Lagrange multiplier matrix with respect to the field. We have used the symbols Fk,u and Fk,uv to indicate derivatives of the Fock matrix. The first derivative contains contributions from Ω as well as the two-electron terms, but only the latter contribute to the second derivative.
Details about the CPHF/KS method, and its implementation in the Crystal program, can be found elsewhere;[16-18] examples of application to materials of different dimensions are also available.[73-77]
The all-trans isomer of polyacetylene (PA) is a good case to check the numerical accuracy of our implementation. PA is a prototype conjugated polymer semiconductor with π-electrons delocalized over a backbone of carbon atoms connected through alternating double and single bonds, with respective CC bond lengths of 1.36 and 1.44 Å[78] (point symmetry C2h). According to experiment,[79] PA has a band gap E_{g} of 1.35 eV. Such a small band gap represents a severe test for DFT calculations of optical properties (especially pure DFT, which seriously underestimates the gap) because of inherent numerical instability as one approaches the conduction limit. The sensitivity with respect to the description of the electronic structure leads to calculated (hyper)polarizabilities that change by orders of magnitude depending on the approximation adopted. Finally, for finite oligomers, it is known that conventional functionals strongly overshoot the correct values.
We computed the static longitudinal polarizability αL and second hyperpolarizability γL of PA (the first hyperpolarizability βL is null by symmetry) for HF and various density functionals.[75] The lowest estimates—that is, αLHF=165.2 and γLHF=6.16·106 a.u.—are provided by HF, which seriously overshoots the energy gap ( EgHF=6.8 eV). Convergence of the calculated optical properties to the infinite periodic polymer limit is essentially achieved for a chain length of 50 monomers. This is consistent with an extensive literature on short PA oligomers which also shows that HF values are relatively close to accurate coupled-cluster and Möller–Plesset results (see Ref. [80] for instance).
Pure density functionals shrink the energy gap of PA to less than 0.1 eV, thus leading to a catastrophic overshoot of the calculated optical properties. For the LDA,[81, 82] the lowest electronic transitions occur in the near infrared ( EgLDA=0.08eV) and the calculated optical properties soar to αLLDA=1.10·105 and γLLDA=1.32·1016 a.u., respectively. As expected, hybrid functionals provide a more accurate description of the PA band structure. PBE0,[83] for instance, yields an optical gap equal to 1.43 eV. The corresponding (hyper)polarizabilities undergo a significant reduction compared to LDA, but still far exceed the HF values: αLPBE0/αLHF≃5 and γLPBE0/γLHF≈500.
For small bandgap polymers, the precision of coupled perturbed (hyper)polarizability calculations is mainly determined by two computational parameters, namely
The shrinking factor used to generate a commensurate grid of k points in the reciprocal space, and
The thresholds on the truncation of the two-electron integral series (see the Crystal user's manual for details).
These two parameters are highly correlated because the set of lattice vectors used to evaluate matrices in real space must also map the k points spanned by the summations in Eqs. (11)–(13). The more the band gap narrows, the more these parameters must be tightened. Thus, on the one hand, increasing the number of k points naturally improves the description of the electronic band structure at k values where Eg(k)→0 (i.e., where α,β,γ→∞). Conversely, a larger number of two-electron contributions is required to account for the concurrent spread of the DM range in direct space.[84] Indeed, a very accurate PBE0 calculation of α and γ required consideration of 100 k points in the first BZ and exchange contributions up to a distance of 250 Å from cell 0 in direct space to achieve well-converged results (Fig. 7).
As implied above, the behavior of PA is exceptional because of its very small band gap. In general, the calculation of β and γ for insulators and semiconductors can be done with commonly used parameter settings, for example, a shrinking factor of 6–8 and the default value of Tx=12 are adequate.[18, 85]
Electric field frequency dependence
The calculated dependence of the electronic linear polarizability on the electric field frequency ω (or wavelength λ) can be directly compared with experimental data in the high frequency limit. For the cubic phase of BaTiO_{3}, the dielectric constant was measured to be 5.41, in units of ε0, at λ = 632.8 nm.[86] We assume that this frequency is sufficiently high that the vibrational contributions are negligible. Then, in order to compare with experiment the frequency-dependent CPHF equations must be solved for the two first-order perturbation matrices, Uk,u(+ω) and Uk,u(−ω). Such calculations were carried out for the wavelength range 632.8–3000 nm.
Because, in our experience, the GGA to DFT provides the best agreement with experimental dielectric tensors and photoelastic constants for most inorganic crystals (as compared to HF and hybrids),[12-14, 16, 18] we used the PBE functional (Fig. 8) for these calculations. It is clearly seen in Figure 8 that the variation of the electronic dielectric constant is quite large over the range of wavelengths considered, so that accurate calculation of ε at the actual wavelength of the experiment is important in this case and the agreement between the calculated and experimental values turns out to be within 2%.
Vibrational contributions to the (hyper)polarizabilities
Nuclear motions have a role in determining static and dynamic polarizabilities and hyperpolarizabilities. An efficient method for computing the vibrational contributions is the finite field nuclear relaxation (FF-NR) procedure proposed by Bishop et al.[87] This method includes all terms that are harmonic or first-order in electrical/mechanical anharmonicity. It is a general scheme that is currently implemented in Crystal14 for nonperiodic directions (three independent directions in molecules, two in polymers, one in slabs) and for tensor components that couple periodic with nonperiodic directions. The key step in the FF-NR method is a geometry optimization carried out in the presence of a finite, that is, static field. This optimization implicitly contains the information about harmonic and anharmonic vibrational parameters needed to obtain vibrational (hyper)polarizabilities. Thus, no force constants or electric property derivatives need to be explicitly calculated. If we denote the equilibrium geometry in a static electric field ( ε) by Rε and R_{0} without the field, then a Taylor series expansion of the field-dependent (electronic) dipole moment at the two geometries yields:
In Eq. (14), the superscript “e” refers to the static electronic value (experimentally obtained by extrapolating to the static limit from measurements at sufficiently high frequency that vibrational contributions are negligible). In Eq. (15), the coefficients aμ,bμ, and gμ contain additional vibrational contributions that are determined by the geometry relaxation induced by the applied static field ε[87]:
atuμ=αtue+αtunr(0;0)(16)
btuvμ=βtuve+βtuvnr(0;0,0)(17)
gtuvwμ=γtuvwe+γtuvwnr(0;0,0,0)(18)
The static αe and βe also contain contributions due to anharmonic force constants and anharmonic electrical property derivatives. They give the vibrational contribution to βe(−ω;ω,0),γe(−ω;ω,0,0) and γe(−2ω;ω,ω,0) in the high frequency limit (see Ref. [87]). For the special case of the vibrational linear polarizability, one can also carry out a Berry phase treatment using the Crystal code and we have found that the two methods give identical results.
As an example of the FF-NR method, calculations have been carried out on infinite periodic (n,0) zigzag BN nanotubes.[76, 85, 88] The largest tube considered (n = 36) contained 114 atoms in the unit cell. Our results in that case show that the vibrational contributions to hyperpolarizabilities can exceed the electronic values as suggested by molecular calculations and theory. The vibrational contribution to each linear/nonlinear property increases with the nanotube radius. This increase is accompanied by an elliptical field-induced deformation of the cross-sectional geometry, which is enhanced for larger radius tubes due to their greater flexibility, along with a reduced band gap. The rate of increase relative to the corresponding static electronic property is dictated primarily by the number of static fields used to characterize the process. For the components considered, it is larger in the transverse direction.
Vibrations in Solids: Analytical IR and Raman Intensities, Vibrational Spectra, and Phonon Dispersion
Crystal14 includes several new tools for characterizing the vibrational properties of crystalline solids. In particular, IR and Raman spectra can be fully simulated, thanks to the newly introduced analytical computation of the peak intensities. This complements the calculation of peak positions available since Crystal03. A procedure has been added to include Lorentz broadening and correction factors for the experimental setup. Thus, one can generate a spectrum that can be directly compared with experiment. Apart from the vibrations at the center of the BZ that are seen in IR and Raman spectra, Crystal14 allows for the calculation of Phonon dispersion, a tool that is mandatory for investigation of thermodynamic properties.
Analytical intensities
In previous releases of the code, it was possible to compute the IR intensity but not the Raman intensity. The calculation was performed by means of numerical differentiation. A scheme involving Wannier functions was presented in Crystal06 and a Berry phase scheme in Crystal09. A new approach has been implemented for this release[21-24] in which IR and Raman intensities are computed analytically and, as a result, very efficiently.
One possible way to obtain the working equations is to differentiate the expression for atomic gradients,[89, 90]
gta=∂E∂ra,t=Tr[12(H+F)[a,t]D−StaDw](19)
where the eigenvalue-weighted DM Dw=ℱ¯[Ck†εCk] has been introduced ( ℱ¯ here is the back-Fourier transform operator). H is the one-electron part of the Fock matrix F. The square brackets used for the superscript a,t in Eq. (19), indicate that differentiation is carried out only for integrals, and does not affect the DM. The aim of using Eq. (19) as a starting point for our treatment here is to avoid explicit gradients of coefficients that would require the solution of an additional set of coupled-perturbed equations.
Born charges ( Z*) (and then the IR intensities) and the Raman tensor (I) can be obtained by differentiating Eq. (19) (at equilibrium geometry) one and two times, respectively, with respect to the electric field, and then imposing the zero-field condition:
These equations can be recast in a form that is computationally more efficient and suitable for implementation. However, that requires some manipulations[21, 23] that go beyond the scope of this article. We have also extensively discussed elsewhere[21, 24] that problems related to the geometric phase are entirely avoided.
The excellent numerical stability of the above procedure with respect to computational parameters such as reciprocal space sampling, integral screening thresholds and, to some extent, basis set has been demonstrated.[22] Despite its quite recent implementation, a number of applications have been carried out on interesting crystalline systems, like quartz,[22] spessartine,[23] pyrope,[91] jadeite,[92] UiO-66,[22] and CPO-27[93] metal-organic frameworks. In the following subsection, we present a new study on vibrational properties of Forsterite.
Simulated vibrational spectra
The IR oscillator strengths f_{p} can be computed for each pth mode by means of the mass-weighted effective mode Born charge vectors Z→p:[94, 95]
fp,tu=14πε04πVZp,tZp,uνp2,(22)
Zp,t=∑a,utp,auZtua∗1Ma,(23)
where ε0 is the vacuum dielectric permittivity ( 1/4πε0=1 atomic unit), V is the unit cell volume, t and u refer to the Cartesian components, tp,au is an element of the eigenvectors matrix T of the mass-weighted Hessian matrix W, that transforms the Cartesian atomic directions into the pth normal coordinate directions (see Phonon dispersion and thermodynamic properties Section later).
A simulated reflectance curve Rtt(ν) along the tt direction can be obtained by means of the Fresnel formula:[96]
where θ is the incidence angle of the IR beam with respect to the normal to the surface and εtt(ν)=ε1,tt(ν)+iε2,tt(ν) is the ttth component of the complex dielectric function. The maxima of ε2(ν) and of Im(−1/ ε(ν)) (Loss Function) correspond to the transverse optical (TO) and longitudinal optical (LO) frequencies, respectively. Note that, when the symmetry of the system is orthorhombic or higher, ε(ν) is a diagonal tensor, so that only the xx, yy, and zz components are nonnull.
The classical Drude–Lorentz model[96] describes the dielectric function as a superposition of damped harmonic oscillators:
εtt(ν)=ε∞,tt+∑pLp,tt(ν),(25)
where ε∞,tt is the high-frequency (electronic) dielectric contribution and the oscillator Lp,tt(ν) is defined as:
Lp,tt(ν)=fp,ttνp2νp2−ν2−iνγp.(26)
Each oscillator is characterized by three parameters: the frequency νp of the TO mode (note: only of the TO mode), its strength along the tt direction fp,tt (related to the plasma frequency ν˜p through fp=ν˜p2/νp2) and the damping factor γp. As the implemented harmonic model does not permit to compute values for the latter quantity, a guess or average value is usually taken.[25]
When simulating the experimental Raman spectrum of a real crystal, a number of factors must be taken into account. The relevant formulas, which are well-known, are briefly summarized here for ease of reference. For an oriented single-crystal, the Raman Stokes scattering intensity associated with the general tu component of the polarizability tensor corresponding to the pth vibrational mode of frequency ωp may be calculated as:
Itup∝C(αtu∂Qp)2(27)
where Qp is the the normal mode coordinate for mode p. The prefactor C depends[97] on the laser frequency ωL and the temperature T:
C∼(ωL−ωp)41+n(ωp)30ωp(28)
with the Bose occupancy factor n(ωp) given by
1+n(ωp)=[1−exp(−ℏωpkBT)]−1(29)
The polycrystalline (powder) spectrum can be computed by averaging over the possible orientations of the crystallites as described in Eqs. (4) and (5) of Ref. [98].
Although the intensity of the TO modes is straightforwardly computed once the appropriate polarizability derivative is obtained, the corresponding calculation for LO modes requires a correction due to χuvw(2):[97, 99]
In Eq. (30), ε−1 is the inverse of the high-frequency (i.e., pure electronic) dielectric tensor. χ(2) is defined as in Eq. (69) of Ref. [16].
IR and Raman Spectra of Forsterite
Mg_{2}SiO_{4} forsterite is the magnesium end member of the olivine family, that are important rock-forming silicates. As a demonstration of the methods presented in this section, we show here their application to the simulation of the vibrational properties of this crystal.
In Figure 9, the simulated reflectance spectrum along the three axes a, b, c is compared with experiment. The similarity of the two is striking, with all the main features of the spectrum being correctly reproduced by the simulation. Note in particular that the oscillator strength affects the width of the reflectance bands and peaks (the larger the value, the wider the band); experimental wide bands and narrow peaks always correspond to computed features of the same kind. Table 4 reports the main ingredients required for the simulated spectra: the IR-active TO vibrational frequencies and the corresponding oscillator strengths; one more ingredient (not reported) is the dielectric tensor computed by means of CPHF module of Crystal14.
Table 4. IR properties of Mg_{2}SiO_{4} forsterite computed at the B3LYP level: frequencies ν (in cm^{−1}) and oscillator strengths f (dimensionless)
B_{3u}
B_{2u}
B_{1u}
#
ν
f
#
ν
f
#
ν
f
1
206.2
0.0248
14
143.1
0.0827
27
206.5
0.0020
2
274.9
0.0559
15
277.2
0.0753
28
277.6
0.2059
3
293.7
0.4393
16
292.2
1.6219
29
290.3
1.3639
4
322.2
0.0554
17
350.2
1.4693
30
313.0
0.0080
5
387.8
1.2456
18
403.3
0.2651
31
419.6
0.9781
6
411.6
1.2740
19
431.7
0.2937
32
427.9
0.2979
7
475.9
0.0128
20
464.8
0.1664
33
489.5
0.2399
8
513.4
0.3391
21
517.3
0.0444
34
513.4
0.4134
9
540.0
0.0048
22
534.5
0.2105
35
874.4
0.5953
10
613.7
0.2044
23
637.6
0.0002
11
838.1
0.0053
24
835.1
0.1342
12
961.9
0.2283
25
870.3
0.3548
13
982.4
0.2415
26
988.9
0.0078
In Figure 10, the computed Raman spectra are reported for the six independent orientations. It is interesting to notice how, for this crystal, these directions indeed provide significantly different spectra. This information is very useful for experimentalists in order to obtain a perfect orientation of the crystal and thus avoid leakage from one symmetry to another. We plan in the future to compare these data with high-quality measured spectra. Table 5 reports the raw computed data that were used to generate Figure 10. Note how the Raman intensities range over more than four orders of magnitude, and that several modes that are considered Raman-active upon symmetry analysis are found to possess zero or very low intensity. Another interesting aspect (that is not seen in Fig. 10 due to the adopted scale) is that B_{1g}, B_{2g} and B_{3g} symmetries have considerably lower absolute intensities compared to the A_{g} modes. This highlights the potential interest in measuring accurate directional spectra as these modes (specially the B_{1g} ones) would be hardly seen in a polycrystalline (powder) spectrum.
Table 5. Raman properties of Mg_{2}SiO_{4} forsterite computed at the B3LYP level: frequencies ν (in cm^{−1}) and polarized intensities (arbitrary units; values are renormalized so that the highest one is set to 1000)
A_{g}
#
ν
a^{2}
b^{2}
c^{2}
1
188.4
7
2
2
2
233.9
14
5
10
3
307.1
29
0
8
4
328.8
3
23
10
5
344.6
15
19
12
6
424.8
14
7
2
7
559.6
5
141
136
8
618.2
328
0
2
9
819.2
519
850
187
10
856.1
1000
323
839
11
967.3
188
125
172
B_{1g}
B_{2g}
B_{3g}
#
ν
d^{2}
#
ν
e^{2}
#
ν
f^{2}
12
224.9
1
23
182.7
2
30
190.0
2
13
260.4
2
24
252.9
3
31
303.6
4
14
317.4
24
25
323.6
2
32
322.3
2
15
367.1
1
26
374.0
14
33
380.5
42
16
391.5
9
27
450.7
57
34
420.8
17
17
441.9
57
28
608.3
58
35
608.9
143
18
596.2
44
29
884.4
85
36
928.0
129
19
644.9
10
20
834.8
32
21
865.5
39
22
979.7
6
Phonon dispersion and thermodynamic properties
The calculation of vibration frequencies at the Γ point (k = 0, at the center of the First BZ—FBZ—in reciprocal space), within the harmonic approximation, is available from the Crystal03 version of the program.[19, 20] The vibration frequencies at the center of the FBZ (directly comparable with the outcomes of IR and RAMAN measurements), are obtained from the diagonalization of the mass-weighted Hessian matrix of the second derivatives of the total energy per cell with respect to atomic displacements u:
where atoms a and b (with atomic masses M_{a} and M_{b}) in the reference cell are displaced along the tth and uth Cartesian directions. The first derivatives of the total energy per cell ( gta=∂E/∂uat) with respect to atomic displacements from the equilibrium configuration ℛeq are computed analytically, whereas second derivatives numerically, using a two-point formula:
∂2E∂uat∂ubu≈gta(ℛeq,ubu=+u¯)−gta(ℛeq,ubu=−u¯)2u¯,
where u¯=0.003Å, a value 10–50 times smaller than that usually used in other solid-state programs.[101-103]
The calculation of the thermodynamic properties is more demanding, as it requires the knowledge of phonon modes over the complete FBZ; phonons at points different from Γ can be obtained by considering a supercell (SC) of the original unit cell, following the so-called direct method.[104, 105] The lattice vectors g=∑tltgat identify the general crystal cell where {at} are the direct lattice basis vectors, with t=1,…,D (where D is the dimensionality of the system: 1, 2, 3 for 1D, 2D, 3D periodic systems): within Born von Kármán periodic boundary conditions the integers ltg run from 0 to Lt−1. The parameters {Lt} define the size and shape of the SC in direct space. Let us label with G the general super-lattice (i.e., whose reference cell is the SC) vector and let us introduce the L=∏tLt Hessian matrices {Hg} whose elements are Hat,bug=∂2E/(∂uat0∂ubug) where, at variance with Eq. (31), atom b is displaced in cell g, along with all its periodic images in the crystal, in cells g + G. The set of L Hessian matrices {Hg} can be Fourier transformed into a set of dynamical matrices {Wk} each one associated with a wavevector k=∑t(κt/Lt)bt where {bt} are the reciprocal lattice vectors and the integers κt run from 0 to Lt−1:
Wat,buk=∑g∈SCHat,bugMaMbexp(ιk·g),(32)
The eigenvalues of the dynamical matrices are the square of the vibrational frequencies νpk, while the eigenvectors correspond to the normal modes. The frequencies νpk define the energy spectrum of the harmonic oscillators:
Epk(n)=hνpk(12+n).(33)
The crucial point is then the speed of convergence of the integration in reciprocal space, that is substituted by a finite sum over the k points, a large number of points implying calculations with a large supercell. Thermodynamic properties of Mg_{3}Al_{2}(SiO_{4})_{3} pyrope garnet are currently under investigation. Figure 11 shows the convergence of entropy as a function of the supercell size, from the 80 atoms primitive cell, indicated as x1, up to x16, a supercell containing 16 × 80 = 1280 atoms, thus permitting to obtain the frequencies at 16 k points. All the considered supercells do have the cubic symmetry, so as to exploit it. In all cases, only nine SCF+G calculations are performed. The figure shows, for various temperatures, the difference for the various supercells with respect to the x16 data. It turns out that already at x8 entropy is quite well converged, the difference with respect to x16 being smaller than 1 kJ/mol. In Figure 12, our ab initio results are compared with experimental data.[107, 108] The overall agreement is quite good, much better than obtained previously.[106]
Anisotropic Displacement Parameters
Another important consequence of thermal motion is the damping of X-ray and Neutron Scattering diffracted intensities as a function of temperature (see DW thermal factors Section). These effects are usually accounted for by means of the mean square displacements (MSD) along each normal mode of the harmonic oscillators. The sum of these MSD over all the oscillators leads to the definition of the ADPs.[109] These quantities can now be computed automatically with the Crystal14 version of the program.[28] ADPs have recently been computed with such a scheme for several molecular crystals.[29] Good agreement was found with respect to experimental data. ADPs convergence was studied with respect to several computational variables, such as basis set, lattice parameters, and Hamiltonian. In Figure 13, we report a graphical representation of the computed and experimental[110] ADPs of crystalline urea. The overall features of atomic thermal motion are well described, with only the computed ADPs of nitrogen overshooting the experimental ones.
The effect of thermal motion on the electronic charge density (ECD) and structure factors is treated in DW thermal factors Section, where the case of crystalline silicon is considered.
Tools for Studying Solid Solutions and Disordered Systems
Solid solutions and disordered materials are characterized by nonperiodic occupation of some sites. Experimental structure determinations normally interpret the site-occupancy pattern in terms of average fractional site occupations. The experimental space group of disordered crystals or solid solutions cannot be used in the simulation of such systems, because fractional occupancies of crystallographic sites cannot be adopted as such. Each average occupation corresponds to a number of cell configurations (i.e., distributions of the atoms or vacancies involved in these fractionally occupied sites) from the collection of which the observed crystal properties arise. Evaluating such average properties with quantum-mechanical simulations is a tremendous challenge due to the convolution of two main requirements. First, in order to account for nonperiodic site occupation and to keep periodicity, multiple cells—supercells ( naa×nbb×ncc)—have to be considered, where n_{a}, n_{b}, n_{c} are integers and a,b and c are the vectors defining the primitive experimental cell. As a consequence, structure relaxation becomes very expensive. Second, the number of configurations increases as R^{D}, where R and D are the number of considered species and sites, respectively. The extreme difficulty of such a task has seriously hindered first-principle quantum-mechanical approaches so far.
To reduce the computational cost, the action of symmetry on the configurations can be considered. Before presenting the tools dedicated to the study of disordered systems and implemented in the new version of Crystal, few concepts about the action of symmetry will be very briefly sketched. For full details, the reader can refer to Refs. [30, 31].
Brief overview on the action of symmetry
Consider a structure (of any dimensionality) characterized by a symmetry group G and possessing an irreducible crystallographic position d of multiplicity |D|, occupied by atomic species A. Using Wyckoff's notation, this position would be noted |D|d. This structure is known as the aristotype for the considered disordered system.
Suppose atomic species X substitutes for A in any proportion on d site. Then, |D|+1 compositions are possible: A|D|−αX,α=0,…,|D|. If positions of A and X cannot be distinguished, the compound is said disordered. For each composition, there are
|Sα|=(|D|α)=|D|!α!(|D|−α)!
possibilities to place atoms A and X. Summing over the |D|+1 composition, |S|=2|D| possibilities exist. Each possibility is called a configuration. In Figure 14, two-color configurations derived from a C4v aristotype structure are presented as a function of the composition. The various species are sketched by colored circles. The composition is expressed as the ratio of the different species. For more than two species, the number of configurations can easily be calculated. In the present version of Crystal, two species only are allowed.
The aristotype symmetry group naturally partitions the set of configurations (S) into subsets of symmetry related configurations. Two configurations are symmetry related or equivalent if it exists an operator g∈G that maps one onto the other. No element of G relates configurations belonging to two subsets. So, subsets are symmetry-independent classes of configurations (SICs). Properties of configurations belonging to the same SIC being identical or equivalent, each SIC can be fully described by one representative. Any element of the class can be chosen as a representative. The contribution of a given SIC to average properties depends on its multiplicity ( M) or the number of configurations it contains, modulated by Boltzmann distribution. The multiplicity of a given SIC is related to the symmetry of its elements M=|G|/|Gs| where |Gs| is the group of any chosen configuration in the SIC. Multiplicity and |Gs| are given in Figure 14. Obviously, two configurations belonging to the same SIC have the same composition, but two configurations sharing the same composition may belong to different classes (see again Fig. 14, composition 2/2).
So, the number of SICs is a key quantity. Counting the number of SICs relies on Polya's theory that exploits the Cauchy–Frobenius lemma, often called the Burnside lemma:
|Δ(S)|=1|G|∑g∈G|Sg|,
where Δ(S),|Δ(S)|, and S_{g} are the set of SICs, the number of SICs in S, and the set of configurations stabilized by g or configurations whose symmetry group contains g, respectively. The |Sg| is easily estimated using the Cayley theorem that states that any group acting on a set of |D| points is isomorphous of a subgroup of the group of permutations of |D| objects. As such, the symmetry operators identified to permutations can be represented by cycles. Considering the example in Figure 14, d_{1}'s action about the plane passing on 1 and 3 is described by (1)(3)(24) while (1234) accounts for the action of the fourfold rotation. It follows that a configuration is stabilized by d_{1} if positions in a given cycle are occupied by the same species. By induction, if |CycD(g)| is the number of cycles for operation g acting on D, |Sg|=|R||CycD(g)|. The number of SICs is given by Polya's formula:
|Δ(S)|=1|G|∑g∈G|R||CycD(g)|.(34)
Polya's formula is not a tool to generate the configurations and their classes. In the two-species case, efficient algorithm relies on lexicographic representation and ordering of the configurations briefly described in the next section. Simple applications show that the number of SICs increases very quickly with the number of involved sites and the number of species. In Table 6, applications to the garnet structure considering solid solution on the dodecahedral or the tetrahedral sites illustrate this point. Despite factorization by the number of SICs, ab initio calculations might remain unfeasible for a large number of situations. For large systems, it becomes impossible to generate the full list of SICs and to compute quantum mechanically all of them. Some authors have proposed to compute randomly selected configurations. However, such an approach suffers drawbacks. For example, equivalent calculations might be performed. The SICs are found as a function of their multiplicity, the probability to reach a given SIC of multiplicity M equals M/|S|. So, the longer (low symmetry) the SIC, the larger its probability to be found. As the size of the cell increases, the lower the probability to reach symmetric SICs. However, several lines of thinking suggest that the most stable configuration should have high symmetry. If so, it is important to reach the SICs independently of their symmetry.
Table 6. Total number of SICs N^{D} resulting from the action of the space group Ia3¯d on sets of sites (Dod: dodecahedral; Oct: octahedral) of the garnet structure considering two species.
N^{Dod}
N^{Oct}
The primitive cell contains 8 dodecahedral and 12 octahedral sites. The conventional cell is twice.
Primitive cell
23
154
Conventional cell
874
179'444
This can be obtained by rewriting Polya's formula. Two elements g and g′ lying in the same conjugacy class can be exchanged, so they have the same cycle structure: |CycD(g)|=|CycD(g′)|. Then, Eq. (34) can be factorized by conjugacy classes,
|Δ(S)|=1|G|∑j=1|C||Cj||R||CycD(gj)|,
where g_{j} is a representative of the class Cj. From this expression, probability distribution on the set of conjugacy classes: C={C1,...,C|C|} can be defined:
For G and D given, the probability of the conjugacy classes can be calculated. If a configuration s stabilized by an operator g belonging to a given class is randomly constructed, the probability that this configuration is in a given SIC ω is Prob(ω∋s|s∈Sg)=|ω∩Sg|/|Sg|. This probability is different from the previous one. It can be shown that selecting the conjugacy classes with such probability (36) and building at random a stabilized configuration for each selected class, SICs are found with an equal probability 1/|Δ(S)|. A configuration stabilized by an operation of symmetry is obtained mapping all elements of the same cycle onto the same species. As a consequence, only the identity class allows to produce asymmetric configuration because every cycle contains one element. If the probability of the identity is set to zero, only symmetric configurations show up. The probability of the SICs becomes a decreasing function of the multiplicity. The most symmetric SICs have the highest probability to be found.
Implemented tools
Tools dedicated to the study of disordered materials offer the possibility to count and generate lists of representatives as well as the multiplicity of SICs. Disordered systems involving several symmetry independent sites can be considered, but only isoelectric substitutions should be analyzed.
Counting and Enumerating SICs
The CONFCNT keyword yields in lexicographic order representatives of the SICs. The lexicographic order is equivalent to the alphabetical order. The |D| involved sites are arbitrarily ordered from 1 to |D|, the two species are identified as 0 and 1 (0 foregoes 1). Each configuration is represented by a “string” (L) of length |D| composed by 0 and 1. Then, a configuration s_{1} foregoes a configuration s_{2} if for 1≤i≤|D|,L1(i)<L2(i),L1(j<i)=L2(j<i). For example, |D|=4,(0010) foregoes (0011) which foregoes (0100). This natural order permits to perform efficient tests to identify new SICs and their representative without requiring memory storage.
Default options ( |D|+1 compositions, symbol for the substituting species, multiplicity etc.) can be modified by means of specific keywords. Among those, ONLYCOMP permits to select a specific composition, thus reducing the length of the output. The number of SICs over the |D|+1 compositions can be obtained without generating the SICs at no computer cost. We warm the user to use this option when starting a new study in order to avoid lengthy output.
Two-body interactions up to a chosen distance are symmetry-sorted invoking the INTPRT keyword. Higher-order interactions are not considered.
Random Selection of SICs
The CONFRAND keyword calls a symmetry-adapted and tunable Monte Carlo (not Metropolis) sampling tool. At a given composition, it returns representatives of SICs and their multiplicity. By default, SICs are found with the same probability, but symmetric SICs can be selected. The number of SICs for the given composition is calculated. The number of searched SICs might be user-controlled. The number of Monte Carlo draws can be chosen, in this case the number of found SICs is not predefined.
Calculations on Selected Configurations
The CONFRAND option produces a file that allows to perform calculations on found SICs. These calculations are launched by the RUNCONF keyword. Taking advantage of the parallel architecture, a multitask scheme is implemented, so several configurations can be calculated simultaneously, each one in parallel mode. The transfer file is quite compact because each configuration is simply identified by an integer that carries full informations about distribution of the species on the different sites.
Results on normal spinel MgFeAlO_{4} are shown in Figure 15. The conventional cell has been considered. Mixing occurs on the 16 octahedral sites. The average energy has been calculated producing SICs according to two schemes. In one scheme, the SICs have been found uniformly at random independently of their symmetry. The second one contains two steps, symmetric SICs are first searched, then the asymmetric ones. The running average is compared to the limit value (dashed horizontal line). In both cases, the convergence is quite rapid. But at low temperature, searching the symmetric SICs first seems more efficient supporting the idea that the stablest configuration has some high symmetry.
Defining the Supercell
As previously stated, simulation of disordered systems requires the use of supercells. In such cases, full point and translational symmetries of the supercell have to be properly considered to generate a minimal list of SICs. This includes nonconventional centering of the supercells. Starting from a primitive cell, suppose that a naa+nbb+ncc supercell is built. This supercell contains na×nb×nc translation vectors corresponding to symmetry operations. Then, the space group of the supercell to be considered is a nonconventional one and includes the operators of the group of the aristotype compatible with the supercell combined with the na×nb×nc translational operators. A special supercell option (SCELCONF keyword) allows to deal with this nonconventional space groups.
Electron Density Analysis
Detailed information about the electronic structure of crystalline compounds is provided by observables related to the one-electron DM, such as the ECD [ρ(r)], and the EMD [π(p)].[111] The ECD is obtainable from diffraction experiments and is straightforwardly related to the topological features of the system in direct space, thus to position of nuclei and characteristics of bonds. The EMD can be reconstructed from directional Compton scattering experiments:[112] the analysis of the distribution in momentum space of the slow valence electrons is known to provide valuable complementary insight into the chemical features of the system.
Some of the new features of Crystal14 as regards electron densities are: (i) the complete topological analysis of the ECD by means of the automated integration of the Topond package into Crystal14 (see Topological analysis of charge density Section); (ii) the parallelization, with linear speed-up, of all the algorithms related to ECD and EMD; (iii) calculation of ADPs and DW thermal factors for dynamical XSF (see DW thermal factors Section); and (iv) new algorithms for the analysis of the EMD (see Momentum density Section).
Topological analysis of charge density
Over the past decade, studies of chemical bonding in solids have experienced a renovated interest. Among other reasons, the availability of reconstructed experimental electron densities, derived from high resolution synchrotron radiation diffraction measurements,[113, 114] and accurate ab initio theoretical determinations, has provided a unique opportunity for comparison, mutual validation, and enhancement in the analytical skills of both approaches.[114, 115]
The Quantum Theory of Atoms in Molecules[116] (QTAIM), as implemented within the theoretical framework of the Crystal program[117] (QTAIMAC) represents the most complete density-based topological tool for chemical bonding studies and, since 1998, has been implemented by Gatti in the public code Topond.[32] Currently, Topond has been embedded into the Crystal14 suite of programs and the whole machinery is now easily accessible through the keyword TOPO of its Properties module. The QTAIMAC scheme explicitly refers to the experimentally observable electron density and yet involves different quantities whose analysis can be carried out separately but whose properties have to be combined and discussed together in order to get a complete and reliable picture of the underlying bonding network.
A first step of the analysis provides so-called critical points of space, that are points where the gradient of any scalar function vanishes. It is worth noting that a two-way correspondence between critical points of the density, ∇ρ(r)=0, and chemically recognizable structures such as nuclei, bonds, rings, and cages, is always possible.[117] Then, the topological analysis of the Laplacian of the electron density, ∇2ρ(r), can reveal the atomic shell structure and the degree of sharing of paired electrons among neighboring atoms. In particular, it has been shown that local maxima and minima in ∇2ρ(r), in the valence shell region of an atom are intimately related to the formation of chemical bonds and consequently to the presence of share electrons and/or lone pairs.[116, 118] This is particularly true in the case of metallic elements where the Laplacian distribution, in contrast to the charge density, can often reveal asphericity resulting from an incomplete filling of the d-shell, suggesting possible mechanisms for metal-ligand interactions.[119, 120] Furthermore, a comparison of crystalline critical points and Laplacian features with the corresponding ones in the case of isolated molecules, or atoms, enables to evaluate packing and cooperative effects on the bonding character.[121] Moreover, QTAIMAC allows for the determination of atomic basins and their properties like the volume, the electronic population, Lagrangian and Hamiltonian electrons kinetic energy. Properties for the total system can be thus defined and calculated in terms of atomic contributions.[122] Finally, Topond can evaluate many local quantities like kinetic energy densities, virial densities and Becke electron localization function.[123] As a matter of fact, topological properties have been extensively used in the characterization of the energetic features of intermolecular interactions in molecular and weakly bound solids and have provided new descriptors and tools for the study of chemical bonding.
In Figure 16, the electron density, its Laplacian and the gradient trajectories are reported for two layered periodic hexagonal crystals, namely graphene (left panels) and boron nitride (right panels). Both structures can be understood as a network of strong interactions where the difference between homo- and hetero-bonds can be fully appreciated. The possibility given by Crystal14 to incorporate electron correlation effects, by using hybrid, DH or MP2 corrected density matrices, represents a great opportunity and a significant progress toward a better understanding of the experimental results (see Climbing the Jacob's Ladder for Solids Section).
DW thermal factors
Due to the fact that core and inner-valence electrons of atoms follow the movement of the respective nuclei, when ECD and related XSF are considered, it is mandatory to account for the effect of finite temperature, for instance by means of atomic harmonic DW thermal factors.[109] An enormous amount of literature has been devoted to the approximate evaluation of such effects in order to allow for a correct interpretation of the X-ray scattering data.[115]
If a harmonic lattice potential is considered, then the probability density function of the nuclear displacements with respect to the equilibrium configuration of the atoms turns out to be a Gaussian function.[124] The most common way nuclear motion effects are dealt with when X-ray diffraction is considered is by means of DW atomic factors which damp the diffraction intensities with respect to increasing wave number and temperature. Atomic DW factors are usually computed from atomic ADPs. It has recently been suggested that ADPs are scarcely affected by anharmonicity so that harmonic mean-square displacements already provide a good description even of strongly anharmonic nuclear potentials.[125]
In the Crystal14 program, we have developed a fully ab initio approach for the computation of ADPs, by solving the lattice dynamics of the system (see Phonon dispersion and thermodynamic properties Section for details), DW factors and dynamical XSFs.[28] This scheme has been applied to the calculation of ADPs of a series of molecular crystals such as urea, benzene, urotropine and l-alanine and a satisfactory agreement has been reported with available experimental data.[29]
As a test-case for the ab initio calculation of dynamical XSF, we have considered crystalline silicon.[28] In general, experimental diffraction intensities and charge densities are less accurate than energy-related properties.[126] Crystalline silicon represents an exception because of the high level of purity of its single crystals and availability of a very accurate technique for the measurement of dynamical structure factors (Pendellösung fringes method)[127-129] which are known by an order of magnitude more accurately than for any other crystal.[130]
In Figure 17, we report the DW damping factors computed with four different Hamiltonians from their respective best determinations of the ADPs for the set of 18 structure factors F_{hkl} of Ref. [130]. The experimental points correspond to the ADP of Ref. [131]. It is seen that HF and B3LYP underestimate the ADP and give a too small damping, the LDA overestimates the ADP while PBE is in very good agreement with experiment.
The technique used relies on the accurate description of the lattice dynamics and of the electron charge distribution of the system. The description of both aspects is dramatically affected by the adopted quantum chemical method. In the case of crystalline silicon, we find that the PBE functional of the DFT provides the best values for both properties. An overall agreement factor of 0.47% between the ab initio predicted values and the experimental determinations is found, as regards dynamical structure factors.
Momentum density
The EMD π(p) is a single-center function, invariant under the symmetry operations of the point group of the crystal, augmented with the inversion arising from the equality π(p)=π(−p);[132] such an object is a function of the counterintuitive momentum-space coordinates and it is characterized by a “collapsed” character about the origin p=0. For these reasons, it is generally difficult to extract the information content of the EMD that is usually revealed in its very subtle features.
In recent years, a series of strategies have been devised and implemented in the Crystal14 program for the analysis of the EMD and Compton profiles (CP) of crystalline materials:
The computation of the EMD from the DM of the system; merits and drawbacks of this scheme with respect to that based on the crystalline orbital coefficients have been illustrated.[34] One of the main advantages is that of making the computation of the EMD of crystals possible at the MP2 level of theory, through the MP2 DM provided by the Cryscor program.[133]
The automated evaluation of the spherical average (SA) EMD function πSA(|p|) and computation of the EMD-anisotropy Δπ(p)=π(p)−πSA(|p|). This scheme allows for the automated computation of EMD-anisotropy maps.[33] As an example, Figure 18 reports an anisotropy map of the EMD of α-quartz, as computed at HF level in the (010) plane. The region of maximum anisotropy lies at |p| values between 1.0 and 1.3 a.u. in this case.
A partition scheme of the total EMD of a crystal into contributions coming from well-defined chemical subunits [ π(p)=∑iπi(p)] that takes advantage of the localization of the crystalline orbitals into Wannier functions.[33, 134-138]
As regards momentum space, the usual approximation of completely neglecting nuclear motions can result in a reasonable estimate of the CPs and the effect of finite temperature is seldom explicitly considered, although few ad hoc models have been reported.[139, 140] We have recently developed an ab initio Monte Carlo technique for the determination of the thermally averaged electronic first-order DM of crystals, in a harmonic approximation. The CPs and EMD of crystals can be computed at any temperature within such a scheme in a general, even if costly, way.[141]
The recent development of an algorithm for computing CPs from the DM of the system, rather than from the crystalline orbitals, made possible the investigation of the effect of the adopted computational method on momentum space properties, even beyond the one-electron approximation, with the MP2 approach implemented in the Cryscor program.[133] We have shown that very accurate directional CPs, as can be measured from the inelastic scattering of high intensity synchrotron radiation by single-crystal samples,[142, 143] can reveal subtle aspects of the electronic structure of periodic systems. In the cases of urea,[144] silicon[145, 146] and quartz[33] the theoretical CPs obtained using single-determinantal approximations to the ground-state wave function were found in fact to present definite discrepancies with respect to the experiment, which were partly removed when use was made of an ab initio technique based on a multideterminantal description of the wave function, namely MP2.
Climbing the Jacob's Ladder for Solids
DFT, in its KS formalism, has now reached a widespread success with an ever increasing number of applications in chemistry, materials science, and solid-state physics.[147] In the quest for the unknown exact exchange-correlation (XC) functional, many different approximations (DFA) have been proposed. Even if not systematically improvable, DFA can be classified in a hierarchical fashion according to the “Jacob's Ladder” proposed by Perdew.[148] When climbing the ladder, more complex ingredients are included in the mathematical form of the XC functional with the aim of reaching higher accuracy. The rungs of increasing accuracy/complexity are: (1) LDA, (2) GGA, (3) meta-GGA, (4) hyper-GGA, and (5) RPA-like functionals.
In Crystal14, all rungs of the Jacob's Ladder are available. Along with LDA and GGA functionals, already included in the previous versions of the code, XC functionals belonging to the third-, fourth-, and fifth-rung have been implemented, namely: semilocal and hybrid mGGA functionals, Range-Separated Hybrids (RSH) (fourth-rung), and Double-Hybrids (DH) (fifth-rung).
New DFT functionals in Crystal14
mGGA Functionals
mGGA functionals currently implemented in Crystal14 are τ-dependent and belong to the Minnesota's set, namely the M05 and M06 families.[149-153] They include: the M05 global hybrid functional and its M05-2X variant with a doubled amount of HF exchange; the M06 hybrid functional and its variants from the 100% HF exchange (M06-HF) to the M06-2X functional with twice amount of exact exchange and the pure mGGA functionals M06-L.
RSH Functionals
Although global hybrids (GH) include a constant amount of exact exchange, in the RSH functionals the amount of HF exchange depends on the distance between electrons.
This is obtained from the separation of the Coulomb operator in different ranges, usually by means of the error function.[154] When a partition into three pieces is adopted the Coulomb operator looks like:
According to the values of c_{SR}, c_{MR}, c_{LR}, ωSR, and ωLR, short-, middle-, and long-range corrected RSH functionals can be defined. This allows one to include exact exchange in the selected interelectronic range and take advantage of its peculiar features. Short-range corrected (or Screened Coulomb) RSH functionals remove long-range HF exchange and are designed for solids where the LR-HF can lead to numerical instability, in particular for metallic systems. On the contrary, long-range corrected RSH functionals include HF exchange at LR to recover the correct decay of the exchange potential which is wrong in semilocal DFT functionals. In between, middle-range corrected RSH functionals are targeted to take advantage of the best of both worlds.
Crystal14 offers a wide variety of RSH functionals, namely: HSE06[155] and HSEsol[156] (short-range corrected); HISS[157, 158] (middle-range corrected); RSHXLDA,[159] LC-ωPBE, LC-ωPBEsol,[160] ω-B97, and ωB97-X[161, 162] (long-range corrected). The Henderson–Janesko–Scuseria model of the PBE exchange hole[163] has been adopted for the implementation of HSE06, HISS, LC-ωPBE and related RSH methods. This hole allows a fully analytical evaluation of the range-separated enhancement factor and recovers the correct PBE limit.
DH Functionals
DH functionals are hybridized not only in the exchange part by including a certain amount of HF exchange (i.e., the dependence on occupied orbitals) but also in the correlation part. In this case, hybridization involves the mixing with a contribution to the correlation energy that depends on unoccupied orbitals through a MP2-like perturbative correction. Different schemes have been proposed, see Ref. [164] for a detailed list of DH methods. DH functionals implemented in the code have the general formula:
ExcDH=(1−A)*ExDFA+A*ExHF+(1−B)*EcDFA+B*EcMP2
as proposed by Grimme.[165] A rigorous proof of the equation above within the adiabatic connection formalism has been recently reported.[166] The MP2-like correlation correction to the SCF energy is computed by means of the Cryscor program.[133, 167, 168] According to the equation above, the following DH functionals have been made available in Crystal14: B2PLYP, mPW2PLYP,[165, 169] and B2GP-PLYP.[170] Although the cost of the calculation is the same as for MP2, DHs are less basis set dependent and partly overcome the drawbacks of DFT methods in the description of weak dispersive interactions.
Availability
Energies and analytic gradients are available for LDA, GGA, mGGA, and related hybrid functionals, for both closed- and open-shell systems. DFT calculations can be run either sequentially or in parallel. For DH, only energies are available. Note that they have the same computational cost as MP2 rather than DFT. Calculations can only be run sequentially and for closed-shell systems. Parallel calculations will be made available with a future Cryscor release.[171]
Approximately, the cost of the calculation is: LDA < GGA ∼ mGGA < GH < RSH << DH. The higher cost of RSH with respect to GH is mainly due to the missing bipolar expansion approximation for two-electron exchange integrals.
Properties that require the solution of the CPKS equations (e.g., polarizabilty, hyper-polarizabilities, and Raman intensities) are limited to a subset of LDA, GGA, and related GH functionals (see Ref. [35] for details).
Validation of XC functionals for solids
Most of the exchange-correlation functionals added to Crystal14 have been designed and optimized for molecules. It is then interesting to validate them for application to solids. Moreover, it has been recently claimed that GHs and long-range corrected RSH functionals are impractical for calculation of crystalline systems.[172] Here, we show that they can be applied to solids as well. As a comparison between various DFA flavors, we report results for the prediction of the lattice parameter, bulk modulus, and band gap of a representative set of nine simple solids, namely: three ionic solids which feature medium-to-wide band gap (i.e., LiF, NaCl, MgO) and six semiconductors (i.e., C, Si, Ge, SiC, GaN, GaAs) ranging from very wide to very narrow band gap semiconductors.
Computed results are listed in Table 7. Mean deviations with respect to a dataset of reference values are reported for 18 XC functionals that belong to the first four rungs of the Jacob's Ladder. Results for HF are also included for comparison. Apart from LiF and NaCl, basis sets for other solids have been taken from Ref. [173]. For LiF and NaCl, a triple-zeta quality basis set has been employed. The reference dataset includes: (i) experimental lattice constants corrected for the zero-point anharmonic expansion, as reported in Ref. [174]; experimental bulk moduli from Ref. [175], and (iii) low temperature (<77 K) experimental (fundamental) band gaps.[176-178]
Table 7. Mean deviation (MD) and mean absolute deviation (MAD) for the equilibrium lattice constants (Å), bulk moduli (GPa) and band gaps (eV) from a variety of DFA with respect to reference values (see text for details) for the set of nine solids a
Method
Lattice constant
Bulk modulus
Band gap
MD
MAD
MD
MAD
MD
MAD
^{a}
The full set of results is available as Supporting Information.
HF
0.062
0.065
16.7
17.6
6.32
6.32
LDA
−0.030
0.032
8.7
9.2
−1.84
1.84
PBE
0.082
0.082
−9.9
0.1
−2.01
2.01
PBEsol
0.028
0.029
−1.8
4.5
−1.98
1.98
B97
0.074
0.074
−4.6
5.6
−0.46
0.72
B3LYP
0.079
0.079
−5.7
7.2
−0.52
0.89
PBE0
0.032
0.035
6.1
6.7
−0.07
0.63
PBEsol0
−0.004
0.015
12.0
12.0
−0.02
0.62
HSE06
0.035
0.038
5.0
6.4
−0.69
0.71
HSEsol
−0.002
0.016
11.1
11.1
−0.65
0.65
HISS
0.007
0.022
16.3
16.3
0.13
0.51
LC-ωPBE
−0.007
0.036
21.0
21.0
4.28
4.28
LC-ωPBEsol
−0.053
0.053
35.0
35.0
5.49
5.49
ω-B97
0.027
0.028
11.2
11.2
4.22
4.22
ω-B97X
0.036
0.037
8.8
8.9
3.79
3.79
RSHXLDA
−0.009
0.037
17.9
18.2
4.50
4.50
M06-L
0.043
0.043
−0.9
6.8
−1.39
1.39
M06
0.051
0.056
1.2
6.3
−0.06
0.71
M06-2X
0.020
0.046
12.1
12.1
1.91
1.91
As regards the prediction of the lattice parameters, PBEsol0 and HSEsol are the best performers, followed by HISS and PBEsol. As expected, HF tends to overestimate the lattice parameters, whereas LDA underestimates them. Pure GGA and mGGA functionals overstimate lattice constants but results improve when a GGA functional devised for solids, as PBEsol, is used. Inclusion of HF exchange in GHs and RSHs also improves the results but this depends on the adopted XC functional (e.g., PBE vs. PBEsol based hybrids).
Most of the tested XC functionals tend to overestimate the bulk modulus of the examined solids. Oddly, the overestimation increases when passing from SC- to MC- to LC-RSH methods, as in the series: HSE06, HISS, and LC-ωPBE. A possible explanation is that the reference data are less reliable than for lattice constants. In any case, PBEsol shows the lowest MAD, followed by B97, HSE06, and M06.
The correct prediction of the band gap is still a matter of debate in DFT. Here, we computed the band gap within the KS formalism as the difference between the top of the valence bands and the bottom of the conduction bands. This corresponds to the definition of the fundamental gap. Therefore, reference data have been chosen to be consistent to that definition. The examined systems span a range from 0.5 to 15 eV. As expected, all semilocal functionals underestimate the band gap, whereas GHs, short-, and middle-range RSH functionals give very good results. On the contrary, LC-RSH methods tend to overestimate the band gap of solids. This is not unexpected because the inclusion of HF exchange at long-range makes them closer to HF which is know to largely overestimate the band gap. However, finding the best XC functional for band gap prediction is not an easy task, as recently discussed by some of us.[179]
Overall, hybrid functionals, both global and short-range corrected, give the best results for basic solid state properties, in particular when combined with a semilocal functional devised for solids, as PBEsol. Nevertheless, a middle-range corrected functional as HISS might be a good compromise, because of its good performance for molecular properties as well.[158, 180] The good accuracy of hybrid HF/DFT methods as testified by Table 7, as well as other results reported in this work, and their very efficient implementation in the code make then Crystal14 stand out for its application in solid-state chemistry and materials science.
Crystal14 Performance
In the preparation of Crystal14, some effort has been devoted to improve general efficiency and optimize memory storage, particularly concerning executions in parallel on a large number of processors for large unit cell cases. Performance of Crystal14 has already been discussed in the particular case of highly symmetric systems such as the nanotubes illustrated in Use of Symmetry in Crystal Section. Such calculations, which are shown to be feasible on one processor of an ordinary desktop, can be run very efficiently in parallel on a large number of processors because very good load-balancing in the computation of one- and two-electron integrals has now been complemented with an effective distribution of the matrix diagonalization task over the irriducible representations of the symmetry group of the system, that is, for example, 640 in the case of the largest nanotube considered in Use of Symmetry in Crystal Section. Indeed, one can achieve almost ideal load-balancing and scaling in these particular cases where the nanotube symmetry properties imply that all irreducible representations include an equal number of basis functions, that is, all matrices to be diagonalized have equal size.
Surfaces and interfacial phenomena, defective solids, biomaterials, and nanoparticulate systems, all require models including a large number of atoms in the unit cell and low symmetry, mostly P1. Crystal14 has been optimized to be efficient and low memory consuming also in these cases and take advantage of the availability of high performance computing (HPC) resources, which have become the method of choice in those areas of science and technology that require the treatment of large amounts of data or the accomplishment of particularly demanding computational tasks. MPPcrystal, the massive parallel version of the program, first available in Crystal09, make it possible to perform calculations on hundreds or even thousands of CPUs, depending on system size. Scalability of the code both with respect to system size and with respect to the number of processors has been increased significantly with respect to previous implementation in Crystal09.[181, 182] Efforts have been devoted also to the reduction of the memory footprint of the code, as the general trend in HPC systems is toward a decrease of the per core available RAM, due to the cost of memory both in terms of production and power consumption.
We refer the mesoporous silica MCM-41 structure as a test case for performance analysis of the current version of MPPcrystal. It consists of a unit cell containing 579 atoms. The basis set was obtained from a 6-31G^{*} Pople's standard basis set leading to nearly 8000 (7756) AOs in the unit cell. For tests at higher system sizes, supercells were built expanding the MCM-41 unit cell along the c axis. Different supercells will be denoted as Xj, with j being an integer defining the order of expansion along c. The B3LYP hybrid exchange-correlation density functional was used. Every run consists in an energy and gradient calculation, SCF+G, which are representative of a typical Crystal run. For example, structure optimizations and phonon calculations consist essentially of several iterations of SCF+G steps.
CPU time scaling
MCM-41/X1 can be considered a medium-size system relative to current capabilities of MPPcrystal as a SCF+G calculation can be run fairly efficiently on a limited number of cores (less than 10) and use of HPC resources is not strictly necessary, although fast-communication hardware is beneficial for attaining excellent scalability. SCF+G calculations for MCM-41/X1 were run in the range of 8–64 cores both on a local Linux cluster of Intel-Xeon processors with Ethernet connections and SuperMUC (LRZ, Germany), a HPC IBM System x iDataPlex powered by 16 Intel cores per node running at 2.7 GHz, with 2 GB/core (Fig. 19).
Scalability with the number of cores used (n) is represented in terms of Speed-up with reference to the case with n = 8: Speed-up = t_{8}/t_{n}, where t_{n} is wall-clock time required for a SCF+G calculation run with n cores. The curve obtained with the HPC system closely approaches the bisectant in Figure 19 corresponding to ideal full scalability within this range of n. Out of this range, scalability is still good. For example, Speed-up = 14.1 versus an ideal value of 16 at n = 128, but it tends to deteriorate for higher n (Speed-up 25.7 and 42.6 vs. ideal 32 and 64, respectively, for n = 256 and 512) The matrix diagonalization step carries the main responsibility for degrading scalability. In our experience, the ratio of the number of basis functions in the unit cell (n_{BS}) to n is a good index for evaluating the expected degree of scalability and it can be stated that good scalability is granted by nBS/n≥50. This criterion suggests that as many as n = 155 cores can be used to run this particular calculation efficiently.
The other curve in Figure 19 obtained from performing the same calculations with the local cluster shows at what extent slow communications affect scalability. A Speed-up of 5.6 for n = 64 is still not dramatically far from the ideal value of 8, though much worse than 7.5 as obtained with SuperMUC, thus suggesting that MPPcrystal can also be run on ordinary hardware without dramatic loss in performance. In absolute terms, it took 3970 wall-clock sec to run one full SCF+G Crystal calculation for MCM-41/X1 on 64 cores of our local cluster versus 1990 s on better performing SuperMUC. Recent improvements in memory storage management enabled us to run such a calculation also with Pcrystal by keeping memory requirements below 2-GB memory occupation per core. Comparison with Pcrystal emphasizes the great advantage of MPPcrystal in this case. Indeed, this is a very unfavorable run with Pcrystal, because of expected large memory requirements and severe inefficiency in diagonalization. Diagonalization is performed only for matrices in the Γ point in this case, due to the large size of the unit cell. As a consequence, that step is thoroughly in charge of one of the cores with Pcrystal, all others being idle, and an SCF+G calculation required 74,375 s to be completed by 64 cores of our local cluster, that is, nearly 20 times more inefficiently than MPPcrystal.
Data in Figure 19 and previous considerations suggest that MPPcrystal can be used for even larger unit cell cases provided an adequate amount of computational resources is allocated. That such a possibility exists is confirmed by data reported in Figure 20, where performance in SCF+G calculations for supercells of MCM-41 up to MCM-41/X10 is shown in a range of 32–2048 core utilization (in this case, Speed-up is referred to t_{32}). In the first place, Figure 20 verifies that very large unit-cell calculations can be run in the scale of nowadays easily accessible HPC resources. Moreover, Crystal14 appears to be fairly efficient and shows very good scalability for any appropriate choice of n with respect to the size of the problem. That is easily appreciated, for example, by comparing curves referring to MCM-41/X1 with MCM-41/X10, as best scalability, satisfying the criterion defined above based on the ratio of n_{BF} to n, is obtained for n≤155 and n≤1550, respectively, in the two cases.
Very similar trends were obtained by running the same calculations on other HPC architectures, such as HECToR (UK), a Cray XE6 system, based on 12-coreAMD Opteron 2.1GHz Magny Cours processors, and Fermi (CINECA, Italy), an IBM Blue Gene Q system. Scalability improves slightly for IBM Blue Gene systems, because a poorer per core performance implies a smaller visibility of the communication overheads and a less relevance of the worst scaling algorithms such as the diagonalization step. It is important to notice that scalability can depend strongly on various factors, as different parts of the code show better parallel performance with respect to others. For example, if on one hand the two-electron integral algorithm is known to scale perfectly with the number of CPUs, on the other hand the diagonalization procedure, which is based on the Scalapack routines, shows a rather poor scaling. The overall scaling depends on the relative importance of these two steps in the SCF cycle. This is apparent in Figure 21 where, despite the good scaling previously shown with the number of cores, computational time increases almost quadratically with the size of the supercell. Indeed, because the calculation of two-electron integrals has become still more efficient and closely approaches linear scaling in Crystal14, time required for matrix diagonalization, which exhibits a quadratic behavior as far as MCM41/X8 (then tending to cubic), dominates and determines the general trend.
Furthermore, scaling depends strongly on the density of the system under investigation and the number of one- and two-electron integrals to be computed depends on the overlap of the basis functions. For the reasons mentioned above, more compact crystals than MCM-41 are expected to exhibit better overall scaling with respect to the number of cores used. As an example, data about performance for a 3×3×3 supercell of calcite is reported in Table 8. Although such a supercell contains less than half of the atoms in MCM-41/X1 (270 vs. 579), scalability on SuperMUC follows a better scaling: at n = 2048, Speed-up is 43.5 for calcite and only 26.1 in the case of MCM-41/X1, both to be compared with an ideal factor of 64.
Table 8. Speed-up for a 3××3××3 calcite supercell as a function of the number of cores (n).
n
t_{n}
Speed-up
Efficiency (%)
All calculations were performed with the hybrid B3LYP functional on SuperMUC. t_{n} is wall-clock time in seconds. Efficiency was evaluated as: Speed-up × 3 2 × 100/ n.
32
38471.3
1.00
100%
64
19769.0
1.95
97.3%
128
9866.3
3.90
97.4%
256
5116.4
7.52
93.9%
512
2753.6
13.97
87.3%
1024
1576.7
24.40
76.2%
2048
883.4
43.55
68.0%
Memory storage
Diminishing memory requirements has been a main challenging issue in extending the capabilities of Crystal to handle large unit-cell systems in the HPC context where the general trend implies reduced memory availability both in terms of GB/core and bandwidth. Such result has been achieved partially by extending the use of dynamical memory allocation enormously, optimizing memory usage throughout the entire code, particularly in the most memory consuming steps, carefully taking sparsity into account and progressively distributing more and more data to the cores.
Figure 19 contains a plot of the maximum memory required by MPPcrystal to run a MCM-41/X1 on a different number of cores. One of the main features of MPPcrystal is an optimized use of memory resources obtained by distributing data to cores as far as possible. However, even MPPcrystal includes a part of replicated data, which are mainly related to information of very general use, such as mapping tables, or data that, if distributed according to a predetermined pattern, would downgrade load balancing of the algorithms. This is the reason why the memory-occupation curve in the figure appears as to be tending to an asymptotic value for large values of n, which represents the maximum amount of replicated data being stored to memory during an SCF+G calculation for MCM-41/X1. The maximum request for memory amounts to 945 MB per core at n = 8. Such request is drastically reduced at n = 16 because memory occupation is dominated by the size of the matrices defined in the reciprocal space (Fock and overlap matrices, eigenvectors) which are fully distributed to the cores, so that a higher number of cores corresponds to more effectively distributed data. As n grows, blocks of such matrices distributed to each core become smaller and smaller until data distribution is almost complete and the base of the replicated-data emerges.
Figure 22 shows how memory request evolves when considering large supercells of MCM-41 and how efficiently increasing n enables to extend the size of the system that can be handled below an assumed threshold of 2 GB of memory used per core, which is about the amount of memory one expects to be available in most HPC systems, like SuperMUC. Figure 22 shows that, for example, supercells as large as MCM-41/X12 satisfy such requirement with n = 2048. For larger supercells, such as MCM-41/X16 (more than 110,000 basis functions), preliminary calculations show that the replicated-data part becomes relevant and approaches the 2 GB limit, so that improvements are needed to further reduce memory requirements and enhance data distribution.
Acknowledgments
This article is dedicated to the memory of Cesare Pisani (1938–2011) and Carla Roetti (1943–2010) who started the Crystal project in the seventies and gave a fundamental contribution to the development of the code during all their scientific careers. Victor R. Saunders is acknowledged for his invaluable contribution in the development of the code and in making it computationally efficient and numerically stable. Contribution to validate the new features by applying them to research problem is recognized to all people who have been working in the Theoretical Chemistry Group of the University of Torino since 2009. In particular, help to test the current version of the code is acknowledged to: Simone Salustro, Jacopo Baima, Marco Lorenz, Agnes Mahmoud, Valentina Lacivita, Gustavo Sophia, Elisa Albanese, Davide Presti, Massimo delle Piane, Marta Corno, Raffaella Demichelis, Anna M. Ferrari, and Giuseppe Mallia. Authors thank Piero Ugliengo for continuous help, useful suggestions, rigorous testing. Improvements of the Crystal code in its massive parallel version were made possible thanks to the PRACE proposals no. 2011050810 and 2013081680. The authors also acknowledge the CINECA Award N. HP10BLSOR4-2012 for the availability of high performance computing resources and support. Progetti di Ricerca di Ateneo-Compagnia di San Paolo-2011-Linea 1A, progetto OR-TO11RRT5 is acknowledged for funding.
Biographies
In memory of Cesare Pisani (Roma 1938–Courmayeur 2011). He graduated in Physics at the University of Milano in 1963 and became Full Professor in Quantum Chemistry at the University of Torino in 1981 where he was awarded the title of Emeritus Professor in 2009. With his pioneering works in the field of solid state Quantum Chemistry, he strongly contributed to the birth and development of the CRYSTAL, EMBED, and CRYSCOR public ab initio programs. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
In memory of Carla Roetti (Biella 1943–Torino 2010). She graduated in Chemistry at the University of Torino in 1967 where she became Associate Professor in Physical Chemistry in 1980. For almost 40 years she has been one of the leaders of the Theoretical Chemistry Group, being involved in the quantum mechanical ab initio study of the electronic properties of solids and in the implementation of the public CRYSTAL program. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]