High‐Throughput Computational Evaluation of Low Symmetry Pd2L4 Cages to Aid in System Design

Abstract Unsymmetrical ditopic ligands can self‐assemble into reduced‐symmetry Pd2L4 metallo‐cages with anisotropic cavities, with implications for high specificity and affinity guest‐binding. Mixtures of cage isomers can form, however, resulting in undesirable system heterogeneity. It is paramount to be able to design components that preferentially form a single isomer. Previous data suggested that computational methods could predict with reasonable accuracy whether unsymmetrical ligands would preferentially self‐assemble into single cage isomers under constraints of geometrical mismatch. We successfully apply a collaborative computational and experimental workflow to mitigate costly trial‐and‐error synthetic approaches. Our rapid computational workflow constructs unsymmetrical ligands and their Pd2L4 cage isomers, ranking the likelihood for exclusively forming cis‐Pd2L4 assemblies. From this narrowed search space, we successfully synthesised four new, low‐symmetry, cis‐Pd2L4 cages.


Input to screening workflow
shows the input SMILES strings for the cage ligand building blocks. SMILES is a text-based representation of molecular connectivity used throughout cheminformatics.

Ligand and cage assembly
The cage ligands and cages were assembled with our Python software, stk, 1,2 using the linear polymer and M2L4 lantern topology graphs, respectively. Recent updates to stk allow the handling of metalcontaining systems including a diverse variety of metal-organic cages and metal-complexes. The stk assembly process, which robustly handles a wide variety of cage molecules, 3,4 i) places building blocks on the nodes of a topology graph, and ii) aligns them, based on their functional groups, along the edges connecting that node to its neighbours. The linear polymer simply places all building blocks in a line and aligns the functional groups of neighbouring building blocks along that line. In this work, cage ligands were built from three building blocks that were connected through bromine functional groups (the choice of bromine is arbitrary and was not meant to represent a realistic chemical disconnection). Cage ligands were initially optimised using version 2 of the ETKDG algorithm in RDKit, which builds a reasonable 3D structure of the ligand. 5,6 A ligand conformer where the two coordinating nitrogen atoms are pointing in the same direction improves robustness of the cage construction process ( Figure  S1). Therefore, we automatically selected the ligand conformer (out of 100 ETKDG conformers) with the smallest angle between the two coordinating nitrogen atoms and the ligands centre of mass. This ligand conformer was optimised at the "very tight" level using GFN2-xTB 7 (version 6.2 was used throughout this work) and used as a building block in the cage assembly process. All structure optimisation was performed using our software package, stko (https://github.com/JelfsMaterialsGroup/stko).
The cage assembly process places single palladium atoms on two four-connected nodes of the M2L4 lantern topology graph and the cage ligand building blocks on four two-connected nodes, which bridge the two four-connected nodes. In this work, all cage ligands had two nitrogen-centred pyridyl functional groups, which were used to align them on the topology graph. In our software, stko, we have recently implemented methods to optimise metal-containing systems using the UFF4MOF forcefield 8,9 in the General Utility Lattice Program (GULP; version 5.1), 10,11 and the xtb software. 7,12 UFF4MOF is an extended version of the universal force field (UFF) 13 that handles metal environments common in metal-organic framework structures (note that only the original parameters 15 were required for square planar palladium metal centres). The GFNn-xTB methods are recently developed and very efficient semiempirical quantum mechanical methods that were parametrised for large parts of the periodic table (up to Z=86) 7,14 and have been demonstrated to be reliable for the optimisation of large transition-metal containing structures. 12 After construction with stk, the lowest energy cage conformer was found using the following sequence of optimisation steps: 1. stk assembles structures based on predefined topology graphs with unphysical, long bonds between building blocks (nodes). The expanded structure is collapsed to a realistic size, while maintaining the shape of the assembled structure, by translating each rigid building block toward the centre of mass of the assembled structure. The algorithm ("Collapser" in stko) stops when the inter-building block distance is less than 2 Å to avoid steric clashes. 2. The cage structure is geometry optimised using UFF 13 in GULP. The atom typing is handled by a Python implementation of the "ForceFieldHelpers" module in RDKit, 6 except for the metal atoms, which are manually typed to match the target types in UFF (because RDKit does not handle the metal-atom typing). Palladium atoms are assigned the square planar atom type, "Pd4+2". The bonding used within GULP matches the bonding in the stk molecule.
3. A conformer search is performed starting from the UFF optimised cage structure using hightemperature molecular dynamics (MD). Two sequential MD runs in the NVT ensemble, using the leapfrog verlet integrator, are performed using UFF and GULP at 1000 K. The first run is a short equilibration with a time step of 0.25 fs for 1.0 ps. The production run is performed for 100.5 ps with a time step of 0.75 fs. From the production run, 100 conformers are extracted at 1.0 ps intervals. 4. Each extracted conformer is optimised at the "normal" level using GFN2-xTB with DMSO as the implicit solvent (with the "very tight" solvent grid option). 5. The lowest energy cage conformer is optimised using GFN2-xTB with the "extreme" convergence criteria and DMSO as the implicit solvent (with the "very tight" solvent grid option). Frequency calculations at the GFN2-xTB level were only performed on the top candidates and previously synthesised structures and confirmed no imaginary frequencies.
Figure S1 Schematic of the optimal cage ligand conformation where the nitrogen-centre of mass-nitrogen angle (shown in dark blue) is minimised.
The workflow we implemented takes only the linker and ligand building blocks SMILES strings as input and automatically assembles, optimises and analyses the cage ligands and resultant cage isomers. Such an approach relies on reasonable default parameters at each step that can handle cages with vastly different chemical and physical properties. We parametrised this workflow on the four experimental cases from ref. 15, governed by only geometrical constraints, to ensure that the same cage properties (energy and structure) were obtained over multiple runs of our procedure. During production, we found a very small number of examples where the optimisation sequence could not be completed automatically, which we handled by manually performing the optimisation sequence. Our automated optimisation process can lead to very unstable structures for cage ligands that cannot, geometrically, form Pd2L4 cages or are very flexible, which we aimed to overcome by the high temperature MD step to "escape" unstable configurations. As part of our selection process, we visualised all top candidate cage structures and found false positives based on our three metrics (energy separation, sqp,min and max ) where all four isomers are strained and would not be expected to form in solution.
We found examples of high ligand strain that may be stabilised by π··· π interactions between the cage ligands, which leads to helicate-like structures. Our findings suggest that there is a balance between cage ligand flexibility and geometric complementarity that governs Pd2L4 formation.

Density functional theory calculations
We use density functional theory (DFT) single point energy (SPE) energy calculations to validate the relative energetics of the four isomers of many cage structures obtained from the screening workflow. We did not perform geometry optimisations at any DFT level because of the cost of these calculations. We tested a variety of methods on four sets of cage isomers using Gaussian 16 with the "Superfinegrid" option. 16 Figure S2 shows that the relative energetics between the cis isomer (C isomer) are equivalent in these cases between the xTB energy and DFT energies (of a series of methods). For the remainder of this work, we perform SPE calculations using the PBE0 17 (or "PBE1PBE" in Gaussian 16) method with the def2-SVP basis set 18,19 and Grimme's D3BJ dispersion correction, 20 in implicit dimethyl sulfoxide (DMSO) using the polarizable continuum model (PCM). 21 This method was recently shown to be effective for Pd2L4 cage systems. 22 Each palladium was assigned a charge of 2 and was assumed to be in the low-spin state.  Table S2 and S3 show the GFN2-xTB free energies and B97-3c, a composite DFT method, energies relative to the cis isomer for the previously reported cage structures. The free energies were calculated using the "--hess" option in xTB from the GFN2-xTB optimized structures. No negative frequencies were found. The B97-3c 23 energies were calculated using Orca 4.2.1 24 and performed in the gas-phase due to SCC issues with the CPCM method. Importantly, similar trends are observed for both methods compared to Figure S2. Input files and output files for xTB and DFT calculations can be accessed at https://github.com/andrewtarzia/citable_data/tree/master/tarzia_lewis_2021.

Cage analysis
Computationally efficient methods were used to analyse the stability and energetic preference of metal-organic cage isomers in a screening workflow. Firstly, isomer preference was determined based on the relative GFN2-xTB 7,14 energy of the lowest energy conformer obtained of each isomer with DMSO as the implicit solvent (with the "very tight" solvent grid option). Secondly, the stability of a cage was screened based on simple geometric descriptors that determine how far from an ideal square planar geometry the palladium metal-centres deviate when in the cage. Two cheap methods were used for calculating the geometric stability of a cage: the maximum plane deviation ( max ) and the minimum square planar order parameter ( sqp,min ) 25 of both metal centres in a cage. Both max and sqp,min are designed to quantify the degree of distortion in the most strained metal centre in a cage and are reasonably well correlated ( Figure S3(b)). The plane deviation of a metal centre was calculated as the sum of the shortest distance of each bound nitrogen atom and the palladium atom from the plane of best fit defined by the palladium atom and the four nitrogen atoms ( Figure 4).
sqp,min was calculated using the default implementation in pymatgen (version 2019.5.8), where the neighbours of the palladium atoms were manually set to the four coordinated nitrogen atoms. 26 Throughout this work, we used the geometrical measures of stability ( max and sqp,min ) mostly as guidelines when making experimental decisions. All pore sizes were calculated using our open-source Python package pyWindow. 27  Tables S4 and S5 show the GFN2-xTB and DFT energies relative to the cis isomer for all previously reported and selected cages. All DFT energies are from SPE calculations of the xTB optimised structure.

Definition of the cage anisotropy
We define a measure of cage anisotropy, "Pd displacement", based on the relative geometry of the square-planar palladium (II) centres in the assembled cage structures. The algorithm for calculating Pd displacement (Δ Pd ) is: 1. Calculate the plane of best fit, and normal to the plane ̂, of one Pd atom and its four bonded N atoms. 2. Calculate the Pd-Pd vector, ̂. 3. Calculate the projection, , of ̂ on the plane of best fit as ̂= ̂−̂·‖̂‖ 2̂ (1) 4. Calculate Δ Pd = ‖̂‖.

General Experimental
Synthesis: Unless otherwise stated, all reagents, including anhydrous solvents, were purchased from commercial sources and used without further purification. CDCl3 and NEt3 were stored over 4 Å molecular sieves prior to use. All reactions were carried out under an atmosphere of N2 using degassed, anhydrous solvents unless otherwise stated. Petrol refers to the fraction of petroleum ether boiling in the range 40-60 °C. Analytical TLC was performed on pre-coated silica gel plates (0.25 mm thick, 60F254, Merck, Germany) and observed under UV light. EDTA solution refers to a 0.1 M solution of EDTA-Na2 in 3% NH3(aq).
Analysis: NMR spectra were recorded on Bruker AV400 or AV500 instrument, at a constant temperature of 298 K. Chemical shifts are reported in parts per million from low to high field and referenced to residual solvent. Standard abbreviations indicating multiplicity were used as follows: m = multiplet, quint = quintet, q = quartet, t = triplet, d = doublet, s = singlet, app. = apparent, br. = broad.
Signal assignment was carried out using 2D NMR methods (HSQC, HMBC, COSY, NOESY) where necessary. In the case of some signals absolute assignment was not possible. Here indicative either/or assignments (e.g. HA/HB for HA or HB) are provided. All melting points were determined using a hot stage apparatus and are uncorrected. Mass spectrometry was carried out by the Imperial College London, Department of Chemistry Mass Spectroscopy Service using Waters LCT Premier for HR-ESI-MS and Thermo Scientific Q-Exactive for tandem MS.

Solvodynamic Radii Calculations
Solvodynamic radii were calculated using a variation of the Stokes-Einstein equation:

= 6
Where is the solvodynamic radius (m)