Accurate Bond Lengths to Hydrogen Atoms from Single‐Crystal X‐ray Diffraction by Including Estimated Hydrogen ADPs and Comparison to Neutron and QM/MM Benchmarks

Abstract Amino acid structures are an ideal test set for method‐development studies in crystallography. High‐resolution X‐ray diffraction data for eight previously studied genetically encoding amino acids are provided, complemented by a non‐standard amino acid. Structures were re‐investigated to study a widely applicable treatment that permits accurate X−H bond lengths to hydrogen atoms to be obtained: this treatment combines refinement of positional hydrogen‐atom parameters with aspherical scattering factors with constrained “TLS+INV” estimated hydrogen anisotropic displacement parameters (H‐ADPs). Tabulated invariom scattering factors allow rapid modeling without further computations, and unconstrained Hirshfeld atom refinement provides a computationally demanding alternative when database entries are missing. Both should incorporate estimated H‐ADPs, as free refinement frequently leads to over‐parameterization and non‐positive definite H‐ADPs irrespective of the aspherical scattering model used. Using estimated H‐ADPs, both methods yield accurate and precise X−H distances in best quantitative agreement with neutron diffraction data (available for five of the test‐set molecules). This work thus solves the last remaining problem to obtain such results more frequently. Density functional theoretical QM/MM computations are able to play the role of an alternative benchmark to neutron diffraction.

Supporting information for the paper: "Accurate bond distances to hydrogen atoms from single-crystal X-ray diffraction by including estimated hydrogen ADPs and comparison to neutron and QM/MM benchmarks"

Structures and CCDC Refcodes
Merged diffraction data of the investigated structures are deposited alongside this publication. The respective CCDC refcodes [1] of the earlier CIF depositions that contain the relevant structural models for refinement are provided in the following Table 1. Table 1: CCDC Refcodes for the structures investigated. Crystallographic information files for these can be downloaded to initiate refinement. The refcode for the neutron data of N-acetyll-4-Hydroxyproline·H 2 O is POKKAD02. For l-Threonine the refinement results of the 19K data were not deposited in the CCDC. Here earlier structure of a 12K dataset [2] should provide input coordinates.
Depictions of non-positive definite H-ADPs from Hirshfeld atom refinment for four amino acids Statistical methods (also contained in the main article) Given a set of N values V = {V i } the mean value and its population standard deviation are defined by: The population standard deviation σ pop or root mean-square deviation (RMSD) gives an indication of the spread of the values around the mean.
The error in the mean is given by: In this supplement several pairs of bond distances are compared, derived from neutron and X-ray measurements as well as ONIOM computations, denoted We follow earlier work [4] and use the statistical measures to describe similarities and differences. In the following comparisons the X-ray or ONIOM value to be compared {C i } is subtracted from the neutron value when this is available, so that a positive value indicates that the X-ray or ONIOM result is too short. When neutron values are not available, the quantum chemical ONIOM result is chosen as benchmark {B i } for the X-ray results. Following values for the combined set V are reported with the following nomenclature: This quantity is also known as the signed difference. The MD can be positive or negative, meaning that on average the parameters derived from the X-ray measurements or ONIOM computations are smaller or larger, respectively, than those derived from the neutron measurements.
(ii) The mean of the square of the weighted difference -weighted by the combined standard uncertainties from both measurements -is denoted The combined standard uncertainty (csu), which appears in this expression, is given by [5] csu .
Combining these equations, the mean of the square of the weighted difference is For reasons of convention, we report the square root of this property and refer to it as the csu-weighted root mean-square difference (wRMSD). For ONIOM results the standard deviation was used as zero.

Detailed bond distances and discussion
In this supplement all bond distances in eight standard and one non-standard amino acids are listed in several tables. These are further analyzed by the MD using all atoms, whereas in the main paper these values were only reported for the bond distances involving hydrogen atoms. Subsequently bond distances of all molecules are discussed in detail case by case.
In the main paper only the X-H distances are discussed and analysed statistically. Due to the small sizes of the molecules studied bond distances will be given in all cases. We start the comparison using N-acetyl-l-4-hydroxyproline monohydrate. For this molecule neutron data at 150 K were available, which are used for comparison with highresolution 100 K X-ray data. Three X-ray models were evaluated: INV refinement that relies on the Hansen/Coppens multipole model, HAR (free refinement of positions and H-ADPS) and HAR with refined hydrogen positions but fixed estimated TLS+INV H-ADPs (Table 2). Bond distances from all five sources and approaches, invariom (with TLS+INV H-ADPs), HAR and HAR with TLS+INV H-ADPs (when necessary), neutron diffraction and ONIOM computations reasonably agree for N-acetyl-l-4-hydroxyproline monohydrate (Table 2), with the exception of a huge outlier for O(2)-C(1), where ONIOM overestimates the result.
The MD calculated for all pairs of bond distances using neutrons as reference shows that ONIOM results agree best with the neutron result. Concerning the X-ray results invariom refinement with TLS+INV H-ADPs shows a higher MD than HAR. Here estimated TLS+INV H-ADPs give the lowest MD, better than HAR with freely refined H-ADPs. We next focus on the ONIOM results. For d,l-asparagine monohydrate, where neutron data collected at room temperature [6] are also available, the comparison likewise shows that neutron and ONIOM results are in best agreement for both molecules for all distances (including the X-H distances). The agreement can be less good for selected bond distances between heavier nuclei, and this will be discussed below for d,l-glu·H 2 O. More remarkable are trends in the individual bond lengths involved in hydrogen bonding, which are well reproduced by the two-layer ONIOM computation despite the approximation of using electrostatic interactions between high and low layer rather than a whole wave function for all molecules in the cluster only. We conclude that ONIOM results can be used as an alternative to neutron diffraction in general, as shown using the examples of the genetically encoded amino acids in this work. Concerning the X-ray results for d,l-asparagine monohydrate (Table 3) values are only given for INV and HAR with TLS+INV estimated H-ADPs to avoid results based on nonpositive definite H-ADPs (depicted in Figure 1). Therefore four rather than five sets of values are provided. HAR results perform better than the ONIOM results in this molecule. The situation that non positive definite H-ADPs are obtained after HAR is similar for l-phenylalanine l-phenylalaninium formic acid, d,l-proline monohydrate and l-threonine, where ortep plots [7] plots are provided in Figure 1 1 . For d,l-serine (Table 4, X-ray data from [8]) room temperature neutron data were taken from [9]. The invariom with TLS+INV bond distances improve compared to 2005 as listed in [8], where for hydrogen at the time only nearest neighbor atoms were considered in invariom model compounds, and where only isotropic displacement parameters were refined for them. In this study next-nearest neighbor model compounds [10] and the TLS+INV H-ADPs were used for hydrogen; including estimated H-ADPs can be considered an important improvement since X-H bond distances get closer to neutron diffraction and ONIOM results. When we use the 100 K synchrotron data to unusually high resolution from [11] rather than the 100 K dataset from the multi-temperature laboratory data from 2005 (Table 5), very similar results are obtained (only shown for invariom refinement since HAR failed). The higher resolution synchrotron data giving better agreement with neutron diffraction than the MoKα data for INV refinement. The MD value findings indicate that agreement with neutron diffraction is again best for ONIOM, followed by Invariom refinement (Flaig's as well as Dittrich's X-ray dataset) with TLS+INV H-ADPs. For HAR free refinement of H-ADP (Dittrich's data) and positional parameters gives better results than from HAR refinement with H-ADPs estimated by the TLS+INV approach when all bond distances are evaluated rather than just the X-H bond distances like in the main paper. Room temperature neutron results [12] are again available for l-glutamine (Table 6), and here free refinement of H-ADPs was also possible in HAR, providing five sets of values for comparison. Here the MD for all X-X bond distances agrees most favorably with those by HAR with TLS+INV H-ADPs and free HA refinement, followed by those computed by ONIOM. It can be noted that for l-glutamine both X-H (main paper) as well as X-X bond distances (the value given in Table 6 from all sources agree very favorably overall. For hydrogen-bonded d,l-glutamic acid monohydrate (Table 7) the trend in agreement in the absence of neutron data is the same than for most of the preceeding cases: using ONIOM results as reference for computing the MD, INV refinement agrees less well than HAR (free refinement), which is again less good than HAR refinement with estimated H-ADPs; neutron data really do not seem to be required to validate X-H bond as well as X-X distances. However, unlike in neutral N-acetyl-l-4-hydroxyproline the C α -N bond distance is an outlier in the theoretical ONIOM computations. It disagrees considerably with the X-ray bond distances in this zwitterionic structure. Table 8: Bond length (inÅ) for zwitterionic l-phenylalanine in the solvate l-phenylalanine l-phenylalaninium formic acid from quantum chemistry (ONIOM B3LYP/cc-pVTZ:UFF) and X-ray diffraction. Like for d,l-glutamic acid monohydrate neutron data for the structure of l-phenylalanine l-phenylalaninium formic acid are unavailable and the C α -N bond distance is an outlier in the theoretical ONIOM computation ( Table 8). The explanation that can be provided is the influence of the crystal field [13] that is only partly taken into account by point charges.
The crystal field (including hydrogen bonding) causes oxygen atoms to polarize towards the positive carbon atom, while the H atoms polarize away from the negative N atom; polarization of the H-atoms thus leads to a weakening of the C α -N bond and its elongated bond distance in the ONIOM computation. Similar polarizations have been visualized for l-homoserine, using different levels of model sophistication, starting from point charges, then improving the description with point charges and dipoles surrounding a molecule, and finally from full periodic DFT calculations [14]. Therefore only a MO description of the low-layer atoms in ONIOM or full periodic computations can give the correct bond distances from theory, at considerably higher computational effort. Because we are mainly interested in the X-H bond distances we consider the ONIOM B3LYP/cc-pVTZ:UFF levels of theory entirely appropriate here. HAR failed for l-phenylalanine l-phenylalaninium formic acid both for free refinement giving non-positive definite H-ADPs as well as for H-ADP-constrained refinement, where no minimum was found; X-H distances from INV refinement agree reasonably well with ONIOM results.  Since there are no neutron results for d,l-proline monohydrate ( Table 9) the reference to compare the X-ray data with has to be the result of the ONIOM computation. Invariom refinement shows a slightly less good agreement than HAR in terms of the MD when using all X-H and X-X bond distances. Optimized ONIOM bond distances come from a triple zeta basis set, as does HAR for the crystallographic refinement. I this regard the multipole model performs well despite the single Slater function per multipole -results are alost as good.
For l-threonine a room temperature neutron structure [15] provides reference bond distances (Table 10). Again the agreement between neutron diffraction and ONIOM computations is clearly most favorable (apart from the C α -N bond), supporting the conclusion of the main paper that the latter can be used to provide comparative results for the other structures. The next-best agreement is HAR and then invariom refinement, both with with estimated TLS+INV H-ADPs. Table 11: Bond length (inÅ) for d,l-valine involving hydrogen atoms from quantum chemistry (ONIOM B3LYP/cc-pVTZ:UFF) and for X-ray diffraction.