Magnetically Induced Alignment of Natural Products for Stereochemical Structure Determination via NMR

Abstract Anisotropic NMR has gained increasing popularity to determine the structure and specifically the configuration of small, flexible, non‐crystallizable molecules. However, it suffers from the necessity to dissolve the analyte in special media such as liquid crystals or polymer gels. Generally, small degrees of alignment are also caused by an anisotropic magnetic susceptibility of the molecule, for example, induced by aromatic moieties. For this mechanism, the alignment can be predicted via density functional theory. Here we show that both residual dipolar couplings and residual chemical shift anisotropies can be acquired from natural products without special sample preparation using magnetically induced alignment. On the two examples of the novel natural product gymnochrome G and the alkaloid strychnine, these data, together with the predicted alignment, yield the correct configuration with high certainty.


Characterization of Gymnochrome G a.
Spectroscopic Experimental Procedures. UV spectra were recorded on a Jasco V-630 UV-visible spectrophotometer, ECD spectra were recorded on a Jasco J-810 spectropolarimeter, and IR spectra were measured on a Jasco 4100 FT-IR spectrometer equipped with a Pike Gladi ATR (attenuated total reflection) accessory. 1D and 2D NMR spectra were recorded in MeOH-d3 at 298 K on Bruker Avance III HD and Avance Neo spectrometers at 800 and 900 MHz equipped with TCI CryoProbes. Chemical shifts were referenced using residual solvent peaks (MeOH: H = 3.31 ppm, C = 49.0 ppm). Standard Bruker pulse sequences (zgpr, jmod, hsqcetgpsp.2, hmbcetgpl2nd, hmbcetgpjcl2nd, hsqcdietgpjcndsisp) were used. 1 H- 13 C HMBC experiments were optimized for n JCH = 8 Hz, as well as n JCH = 3 Hz for long-range correlations. Long range CH-couplings were measured using J-HMBC experiments. [1] The sign of 2 JCH-couplings was determined using a HSQC-HECADE-type sequence. [2] High-resolution MS spectra were obtained using a Bruker micrOTOF mass spectrometer with electrospray ionization in the negative-ion mode. HPLC was performed on an Agilent 1200 Series system using a Phenomenex Gemini C18 column (250 × 10 mm i.d., 5 μm). b.
Animal Material. Two specimen of H. naresianus were collected from Shima Spur, Kumano-nada Sea, Japan, from depths of 763 to 852 m. Voucher samples were deposited in the collection of the Systematische Zoologie am Museum für Naturkunde Berlin (ZMB Ech 7415 and ZMB Ech 7416). [3] c.
Extraction and Isolation. Lyophilized H. naresianus material (9.0 and 10.1 g) was successively extracted with MeOH/CH2Cl2 (1:1), MeOH/H2O (9:1), MeOH/H2O (1:1), and distilled water. The methanol-soluble pigments of the MeOH/CH2Cl2 extract were combined with the other extracts and subjected to semipreparative HPLC using a linear gradient of acetonitrile/20 mM aqueous ammonium acetate (45:55) to 85% acetonitrile. Fractions were purified and concentrated using solid phase extraction (Bondesil C18, 40 µm). Pigments were washed with water and eluted with MeOH/H2O (9:1) followed by evaporation of the eluates to dryness to give several anthraquinone and biaryl pigments reported previously, [3] compound 1 and further phenanthroperylene quinones that will be described elsewhere.  Table S4; 13  e. DP4+ Analysis. [4] The starting geometries were built in Maestro 11.4 [5] . For all configurations, the quinoid carbonyl acted as Hbond acceptor for the 1-, 6-, 8-, and 13-OH, the 11-OH group acted as H-bond acceptor for 10-OH, and 12-Br acted as H-bond acceptor for 11-OH. In all following steps this OH conformation was kept fixed to greatly reduce sampling complexity. The sulfate group was built in its deprotonated form. The aromatic ring was built in propeller conformation (P). All four combinations of configuration at the two stereogenic centers were generated. A conformational search was performed with Macromodel 11.8 [6] using the MMFF forcefield [7] in vacuum. The method was a Monte-Carlo torsional sampling [8] of all non-terminal rotatable side chain bonds with 100 000 steps, a minimization convergence of 1.0 kJ/mol/Å, and an energy threshold of 21 kJ/mol. The resulting conformers were subjected to a finer minimization with 0.001 kJ/mol/Å convergence, and structures with a maximum atom deviation below 0.5 Å were discarded as duplicates. All conformers from the ensemble were geometry optimized at the B3LYP [9] /6-31G [10] level of theory with Gaussian09 [11] , and all conformers below 8.4 kJ/mol of the minimum zero-point corrected (ZPC) free energy were discarded. The shielding constants were calculated at the mPW1PW91 [12] /6-31+G(d,p) level of theory using GIAO [13] and an implicit solvent model [14] . The resulting shieldings were then averaged assuming a Boltzmann distribution with the ZPC free energies from the geometry optimization. The shieldings from all side chain H-and C-atoms and the ipso-and ortho-C-atoms of the aromatic system were used to calculate the DP4+ probability. Unassigned methylene protons were assigned to the better-fitting value for each configuration. The resulting DP4+ probabilities were 0.00% ((2'R,2''R), (2'R,2''S), (2'S,2''S)), and 100.00% ((2'S,2''R)) f.
Relative Configuration from J-Couplings and ROE Constraints. The chiral aromatic system was assumed to be propeller-(P) based on literature data, [15] which was confirmed via ECD calculations (see below). This system could be used as a chirality reference for the side chains. A conformational search suggested that in all relevant conformations the side chains point away from each other (defined by the dihedral around bonds 3-1′ and 4-1″), and therefore the protons at 1′/1″ point towards each other (Fig. S1). This hypothesis was supported by a ROESY spectrum, where correlations between 1′ and 1″ were the only detectable intersidechain contacts. With the conformation around these bonds being known, the diastereotopic protons at 1′ and 1″ could be assigned using 3 Jcouplings to the carbon atoms 2, 3a, 3b, and 5. For C-H-arrangements close to antiperiplanar larger J-couplings are expected than for synperiplanar arrangements. E.g., Proton H1a″ has a larger coupling to C5 than to C3b, while H1b″ shows the opposite trend. Therefore, H1a″ has to be the proton pointing inward (pro-R), while H1b″ is the proton pointing outward (pro-S) (Fig. S1). The same argument applies to H1′. Having established the assignment of these methylene protons, the dihedral conformation of bonds 1′-2′ and 1″-2″ can be examined. The relevant couplings are 3 JH1H2, 3 JH1C3, and 2 JH1C2 (with primes added accordingly). 3 J is large for an antiperiplanar Scheme 1. Constitution, configuration, and numbering of the investigated structures. arrangement and small for a synclinal arrangement of the coupling nuclei, while 2 JCH couplings are most useful if there is an electronegative substituent X on the carbon atom (which there is in both side chains). In that case, the apparent 2 JCH is large if the proton and X have a synclinal arrangement and small if they have an antiperiplanar arrangement. A review of these relations can be readily found in literature. [16] Following these arguments, there is only one possible arrangement of substituents that is in agreement with the measured couplings, and the configuration of the stereogenic centers can directly be determined. This can be most easily seen as a Newman projection (Fig. S2). All relevant couplings are shown in Table S5.

g.
Absolute Configuration from ECD. The configuration of the aromatic system was confirmed to be propeller-(P) by ECD calculations. The conformations generated to analyze the anisotropic data were used as a starting point (see section S3). We performed an excited state calculation with 100 states using time-domain DFT using three different functionals PBE0 [17] , B3LYP, and CAM-B3LYP [18] , and the pcseg-1 basis set. It is recommended in the literature to use different types of functionals (with and without longrange corrections, and different fractions of exact exchange). [19] To reduce computational demand, only conformers with a free energy of less than 11.4 kJ/mol above the ground state were considered (i.e., ~1% population at 298 K). The first calculations with the PBE0 functional revealed that the ECD is largely dominated by the aromatic system and has very little dependence of the side chain conformation and configuration. Therefore, the calculations with B3LYP and CAM-B3LYP were only performed on the correct configuration (2'S,2''R). The generation of spectra and Boltzmann averaging with ZPC free energies was done using SpecDis 1.71. [20] We also used the functionality in SpecDis to calculate similarity factors, which are a measure of similarity between an experimental and a calculated spectrum. [20a, 21] In this procedure, the UV correction, the bandwidth , and the scaling factor are optimized automatically (see Table S1). The UV correction is an empirical shift along the wavelength axis of the calculated spectrum which accounts for systematic errors in transition energies. [19] The bandwidth is simply the standard deviation of the Gaussian distribution used to transform a stick spectrum of transition energies into a continuous spectrum comparable to the experiment. The scaling factor scales the intensity of the calculated spectra to match the experiment. This is necessary as absolute intensities are generally inaccurate in ECD calculations. The similarity factors, which range from 0 to 1, are calculated for both enantiomers. Generally, the spectra calculated without long-range corrections (B3LYP and PBE0, Figure S3) show much better agreement with the experiment than CAM-B3LYP. But for all cases both the visual comparison as well as the similarity factors (Table S1) show clearly that our original assumption for the configuration (P) was correct.

Acquisition of Anisotropic Data
a. Sample Preparation. For 1, 2.0 mg of the dry material from purification was dissolved in ca. 150µL MeOH-d3 and transferred into a 3 mm NMR tube. For 2, a 200 mM solution of strychnine (Sigma-Aldrich) in CDCl3 with ca. 1% v/v TMS was washed twice 1:1 with 250 g/L K2CO3 (aq) to remove traces of chloroform degradation products. The solution was then dried over K2CO3 (s), transferred into a 5 mm NMR tube, and degassed using 5 freeze-pump-thaw cycles. Finally, the tube was flame-sealed under vacuum. b.
For 1, the temperature control was cross-checked and fine-tuned using a 99.8% MeOD-d4 reference sample (Sigma-Aldrich). [22] For 2, the chemical shift of the residual water peak (2.397 ppm @ 298 K, −23 ppb/K temperature shift) served as internal temperature standard. Proton 1D spectra were acquired using simple pulse-acquire sequences. For 1, low power presaturation was added for solvent suppression. Spin-echo based solvent suppression (WATERGATE etc.) is less suitable as it introduces phase errors in the multiplet structure due to homonuclear coupling evolution. Carbon 1D spectra were acquired using J-modulated echo sequence (Bruker standard sequence jmod) with 4 -5 kHz proton decoupling. For 2, HSQC spectra were acquired using a perfect-CLIP-HSQC sequence. [23] c.
Data Processing. For data processing, zero-filling to 1024k (2 20 ) points (1D spectra) and 64k (2 16 ) points (HSQC) was applied in the direct dimension. For the low-sensitivity carbon-based experiments (J-mod/HSQC), 0.3 Hz line-broadening was applied before Fourier transformation. Peak picking and multiplet analysis were then done in MestreNova 11. For 1, JHH couplings were taken from the proton 1D, and peaks showing overlap were discarded. For 2, all couplings were extracted from the HSQC. The slices in the proton dimension containing the relevant peaks were extracted for this purpose. Any HSQC peaks showing overlap with long range correlations of neighboring peaks were discarded. This was the case for the atom positions 1, 2, and 3. Since all RDCs were much smaller than the corresponding scalar coupling J, the sign of the RDC follows from the sign of J. All 2 JHH were assumed to be negative, all others to be positive. d.
Elimination of Chemical Shift Errors. For this purpose, the temperature and time/degradation dependence of the shifts had to be determined. For temperature, J-mod spectra at different temperatures, but equal field were acquired (1: 297 K, 298 K, 299 K @ 800 MHz, 2: 298 K, 301 K @ 900 MHz). For each resonance, a linear fit of the form = 0 + was performed with 0 and as fit parameters; all resulting were arrayed as a vector ⃗ . For degradation, J-mod spectra at different time points, but equal fields were acquired. We acquired three pairs of spectra for 1 and two pairs of spectra for 2 with time differences between one and three months to verify the reproducibility of this approach. For practical application we recommend acquiring the data as quickly as possible, and to repeat the measurement of the first field point at the end of the measurement series. This way the degradation effect during the measurement series is captured, requiring only one additional measurement and no extra waiting time. To give the reader an idea of the magnitude of these effects, the root mean square of the temperature dependence of the carbon chemical shifts was 3.3 ppb/ K for 1 and 5.9 ppb/K for 2, while for the degradation it was 3.7 ppb for 1 and 0.4 ppb for 2. The chemical shift differences of all resonances for such a pair of spectra are then arrayed as a vector ⃗ degrad . In the case of 2, small changes in referencing were also a source of error, supposedly due to small temperature and degradation effects on the reference shift (TMS). Since all resonances are shifted by an equal amount by the referencing, the elements of the vector ⃗ ref representing this change are equal (e.g., 1). Note that only the direction of these perturbation vectors is relevant, but not their magnitude. These perturbation vectors are now arrayed into an × matrix , where is the number of resonances and is the number of perturbation vectors, and a decomposition of this matrix is performed. The resulting orthogonal matrix can be used to perform an orthonormal transformation of the chemical shift space: the chemical shifts from a given spectrum are arrayed into a vector ⃗ , which is then transformed into ⃗ * = T • ⃗ . The first elements of ⃗ * are affected by the systematic errors and have to be discarded, while the remaining ones can be used as structural constraints in the following process. e.
Determination of Anisotropic Components. To determine RCSAs, the corrected chemical shifts were expressed as peak position in Hz, and they obey the following relation: In this context, the unknown parameters iso and aniso correspond to the isotropic chemical shift in ppm and to the desired RCSA in ppm T −2 , respectively. As they both appear in linear form in Equation (1), the least-squares solution to a given set of ( meas , 0 ) data can be determined easily and deterministically. Similarly, the measured coupling Δ meas obeys the following relation as a function of the field: Here, Δ iso and Δ aniso correspond to the J-coupling in Hz and to the RDC in Hz T −2 , respectively, and can again be determined as least-squares solution to a given set of ( Δ meas , 0 ). As mentioned above, aniso and Δ aniso have units of ppm T −2 and Hz T −2 , respectively, but for all subsequent data interpretation they were converted into frequency units by evaluating the anisotropic components of Equations (1) and (2) at 0 = 23.49 T (1 GHz proton frequency). While it is a somewhat arbitrary choice, it enables the simultaneous interpretation of RDCs and RCSAs. We give them both the same weight since at this field the one bond CH-RDCs are in size between aliphatic and aromatic 13 C-RCSAs. We excluded couplings from the evaluation that showed obvious peak distortions (see above). For strychnine, 1 DCH -couplings from the two protons of a methylene group were evaluated as their mean value to remove the need for diastereotopic assignment. A given geminal 2 JHH + 2 DHH -coupling was determined individually from each proton peak and averaged. f. Determination of Weighting Factors. Since the alignment in strychnine is about a factor of 5 smaller than for gymnochrome G, the data are more sensitive to errors, and different RDCs/RCSAs may have different errors associated to them. We therefore devised a way to determine weighting factors to give a lower importance to data points with higher uncertainty. In practice, this was most important for RDCs, but we applied it to both RDCs and RCSAs. Determining weighting factors for both types of parameters is a general way to make simultaneous evaluation possible. In the following, we illustrate the procedure on the example of RDCs. All data points are affected by a random error, and the uncertainty of the anisotropic component is proportional to this error. In principle, the error (standard deviation) of each field-series of couplings can be estimated by calculating the RMSD of the fit to Eq. (2): where the indices and refer to the coupling (atom pair) and the field, respectively. Since there are only six data points (fields) along , there is a significant uncertainty associated with this. If this RMSD is used as a weighting factor to the corresponding RDC, data points that by chance have a very low RMSD will be overvalued. However, since some few couplings are affected by stronger peak distortion, there was the need to devalue them relative to the more accurate RDCs, i.e., determine individual weighting factors. We made the following assumption: All couplings are equally affected by the same (random) base error. Some few couplings are then additionally affected by significant individual errors due to peak distortions etc. If there were no individual errors, the base error could be estimated by taking the RMS of the RMSD of all field-series fits. Some of these field-series are affected by additional errors, and lead to outliers in their corresponding RMSD, which greatly affects this RMS of RMSD. To cope with that, we estimate the base error base by calculating the root median square instead: base = √median( 2 ) .
Now, we estimate the individual error as the RMSD of the field-series fit and calculate the total error as RMS of base and individual error. We use the inverse of the error as weight : This enables a smooth transition between small and large weights, prevents the overvaluation of data points, and enables to give smaller weights to data points with large error.

Computational Work / Molecular Modelling a.
General Procedure. For 1, the conformer geometries from the DP4+ calculations at the MMFF level of theory were used as starting points. For 2, starting structures were built in Maestro 11.4. The multiple fused and bridged cycles of 2 allow only a subset of the theoretically possible 32 diastereomers, and we were able to build 22 configurations. Although some of these have strongly distorted carbon binding geometries and could be discarded as they are unlikely to be stable, we purposefully left them in the analysis as this further demonstrates the discriminating power of the method. We cross-checked the resulting geometries with the structures of Bifulco et al., [24] who conducted a thorough study of the possible diastereomers of strychnine. We optimized all geometries at the B3LYP/pcseg-1 [25] level of theory, including vibrational frequencies for 1 to get zero-point corrected (ZPC) free energies. We calculated NMR shieldings and the magnetic susceptibility tensors at the B3LYP/pcSseg-1 (1) and B3LYP/pcSseg-2 [26] (2) level of theory using GIAO [13] and an implicit solvent model [14] . All DFT calculations were performed in Gaussian09. [11] b.
Convergence Problems for Susceptibility Calculations. During the first attempts to predict magnetic susceptibilities, we encountered severe basis set convergence problems, which were most prominent for large Pople-type basis sets. We investigated this by calculating shieldings and susceptibilities for strychnine using 57 different basis sets, from STO-3G to 6-311++G(3df,3pd). For relatively small basis sets (e.g., 6-31G, 6-31G(d)) the agreement of the calculated alignment (i.e., the susceptibility anisotropy) with the experimental data was good, but with larger basis sets it became increasingly worse. This was most extreme for basis sets with diffuse functions (i.e., functions with small exponents decaying slowly with distance), which points to numerical artifacts due to near degeneracy of some basis set functions. It is noteworthy that these numerical artifacts had a significant effect only on the susceptibilities, but not on geometries or nuclear shielding tensors. These numerical artifacts could be avoided by using a finer than default integration grid using the Gaussian "UltraFine" keyword. As an additional measure, we decided to switch to a more balanced family of basis sets with a clear hierarchy. We tested Jensen's polarization-consistent segmented basis sets as well as Karlsruhe (def2) basis sets [27] on strychnine, which yielded very similar results. We finally chose to do all subsequent calculations with Jensen-type basis sets as they were specifically designed for DFT calculations and provide a subfamily optimized for nuclear properties (pcSseg-n).

Data Evaluation a.
Calculation of Anisotropic Data. As mentioned in the main article, the anisotropic parameters obey the following relations: where are the gyromagnetic ratios of the coupling nuclei and ⃗⃗ the internuclear vector between them, is the chemical shift tensor of the nucleus in question, and is the alignment tensor. The alignment tensor depends on the magnetic susceptibility tensor : Since all quantities are either constants ( , 0 , ℏ, B ), known experimental parameters ( , 0 ), or molecular properties that can be calculated via DFT ( ⃗⃗ , , ), theoretical RDCs and RCSAs can be calculated by simply inserting all constants and parameters into Equations (6) and (7).
For 1, averaging of the calculated anisotropic data is done over all conformers using ZPC free energies ZPC, using the following Boltzmann populations : where the index refers to the conformation. For the fit of the alignment tensor, Equations (6) and (7) have to be brought into a form where the linearity of the components of is apparent. First, it is helpful to realize that both equations have the exact same form: = tr( ) .
and rearranged, by using the fact that and are real symmetric: ) . (12) The reason that the products of diagonal elements are rearranged in this way is that now the first summand can be discarded since is traceless, i.e., + + = 0 . What is left is a linear equation of five independent components of . This becomes an (ideally overdetermined) system of linear equations if all data points (RDC/RCSA) with their individual are taken together, and the least-squares solution of this system yields the fitted alignment tensor. This fitted alignment tensor can then in return be used to back-calculate the anisotropic data.
For 1, there is the additional complication of multiple conformers. While in principle each conformer has its individual alignment tensor, this would increase the number of free variables beyond feasibility, but it is also not necessary. This is because the main source of alignment (i.e., the aromatic system) has no conformational freedom. Therefore, we can ignore these differences and apply a single tensor approximation. To do so, we use a common frame defined by the aromatic system for all conformers. In practice this is done by defining the vectors ⃗ 1 = C12 ⃗⃗⃗⃗⃗⃗⃗ − C5 ⃗⃗⃗⃗ , ⃗ 2 = C9 ⃗⃗⃗⃗ − C2 ⃗⃗⃗⃗ , and ⃗ 3 = ⃗ 2 × ⃗ 3 , orthonormalize them, and use them as base vectors for a common Cartesian coordinate system. In this common frame, we calculate the components of the matrices cf and average them, again with the ZPC free energies. The alignment tensor fit is done using these averaged , so the fitting Equation (10) adapts to the following: Note that it would be a mistake to average over coordinate vectors as they contribute quadratically to the RDC coefficient matrix cf .

b.
Comparison of Experimental and Calculated Data. Both ways of determining the alignment tensor, "fitted" versus "predicted", yield a set of calculated data for each configuration, which is compared with the experimental data to identify the best match. This comparison is done in the same way for both back-calculated and DFT-predicted data sets. As primary criterion for the agreement of the calculated and experimental data we use the -factor, [28] which is the RMSD between the two, scaled by the RMS of the experimental data: where refers to any type of data point. The configuration whose calculated data gives the lowest -factor is assumed to be the correct configuration. The -factors can be found in Tables S2 and S3. However, this approach gives no quantitative estimate for the confidence of the result since differences in -factors cannot be directly interpreted as a measure for the confidence. We use a bootstrapping test to examine the statistical properties of the data and get an insight on the confidence of our result. [29] Bootstrapping has the advantage that no assumptions about source, size, and distribution of error have to be made. In this procedure, resampled data sets of the same original size are generated by drawing randomly with replacement from the experimental data. These resampled sets are then each subjected to the same analysis as the original data set. The DFT-predicted data has to be simply re-matched to the sampled experimental data points. However, the procedure of fitting the alignment tensor and back-calculating the data has to be repeated for each resample. By repeating this resampling many times (in our case 10 6 ) one gets a distribution of any quantity that results from the data evaluation. As mentioned before, we assume the correct configuration to be the one with the lowest -factor, so we did this for each resample. The fraction of resamples that yield a certain configuration is then interpreted as confidence in this result. These confidences are shown in Figure S5, as well as in Tables S2 and S3. We do not use CSA [30] in this work which was designed for external alignment with RCSAs of several Hz, where the error is largely independent of the value of the RCSA in Hz. Here, due to the smallness of the effects, the error of the RCSA measurement is mainly determined by the size of the RCSA. Therefore, the underlying assumption that makes the RCSA parameter more distinctive for external alignment is not applicable for the selfalignment presented here. c. A Note on Feasability. The feasibility of our approach does not directly depend on the degree of alignment, but more on the differences in anisotropic parameters between the diastereomers. As these can be accurately predicted, the feasibility for any given molecule can be evaluated with inexpensive DFT calculations before investing expensive NMR time. The molecule strychnine is certainly near the limit of what we assume to be feasible at this point; we have calculated expected anisotropic parameters for various non-aromatic compounds (e.g., sugars) and they are typically about an order of magnitude smaller than for an aromatic compound such as strychnine. But also the geometrical differences of a set of diastereomers obviously affect how well they can be discriminated. Lastly, the availability of high magnetic fields plays an important role. Since the magnitude of anisotropic effects scales with the square of the field while the measuring accuracy is (in first approximation) constant, having very high fields such as 1.2 GHz becoming available right now greatly enhances the feasibility of our approach. To be more quantitative, the weighted RMSD exp,calc of the experimental and calculated anisotropic data (RDCs and 13 C RCSAs) for strychnine exp,calc = √ is exp,calc = 0.042 Hz. This can be taken as an estimate of the current measurement accuracy. The RMSD , of calculated anisotropic data for two different configurations and of a molecule should therefore be larger than this for discrimination to be possible:

Instructions for Data Collection
All numerical data is published in digital form instead of tables in this SI for improved accessibility. There are three subfolders: computation_logs contains all data concerning the molecular modelling, anisotropic_datasets contains all raw experimental data, and intermediate_results contains fitted and calculated RDCs/RCSAs and the data needed for error compensation. The folder computation_logs contains several subfolders with self-explanatory folder names. Each of these folders contains one or two subfolders corresponding to the two investigated molecules. a. Computation: Gaussian. The geometry optimizations, the frequency calculations, the NMR calculations, and the ECD calculations were done in Gaussian. The folders contain Gaussian logfiles and Gaussian comfiles. The comfiles do not include the geometry since it was always read from a previous checkpoint file. However, the logfiles always contain the geometry information. The filenames refer to the molecular structure. For 1, it is of the form "g8_xx_stateiiii.com/log", where xx indicates the configuration and iiii the conformation number. For 2, it is "strychnine_ii.com/log", where ii denotes the number of the configuration. 01 is the correct configuration, the others are in somewhat arbitrary order. The order can be found in the figures in the main text or in the file confs.txt in the folder with the starting geometries. b.
Computation: Conformational Search. The starting geometries are stored in the appropriate folder. For 2, they are stored in xyz format. For 1, they are stored in mol2 format. For the conformational search of 1, Macromodel logfiles and comfiles for the conformational search as well as the following finer minimization are provided. Input and output structures in Maestro's own format are provided as well. The resulting conformers are provided in one file per configuration in mol2 format. c.
Raw Data: Chemical Shifts. Chemical shifts are provided in text files with tab separation and a header line for the identification of columns. They are simply copied peak tables from MestreNova 11.0 and are also self-explanatory. The field is given in the filename. Files named T1, T2, or T3 contain temperature dependent shifts. For 1, all data was acquired twice for each field, and the degradation dependence was determined from those pairs (e.g., 600_C_1.txt and 600_C_2.txt). For 2, the pairs for the determination of the degradation vector were stored separately in files starting with deg or deh. In both cases, mnova files containing the spectral data are also included. d.
Raw Data: Couplings. Couplings for 1 were extracted from 1D proton spectra, which was not possible to automate since multiplets would overlap differently depending on the field. The suitable multiplet components for the determination of the coupling were therefore chosen manually and the couplings deposited in couplings.csv. For 2, couplings were extracted from CLIP-HSQC traces. All traces are provided as mnova files, separated by field (e.g., 400_Js_strychnine.mnova). The extracted couplings are saved in tab-separated text files (e.g., 400_J.txt). e.
Intermediate Data. In the folder intermediate_results the following data are provided: carbon resonance assignment, chemical shift error vectors ⃗ T and ⃗ degrad , the transformation matrix , and the fitted RDCs and RCSAs with the fit residuals and the weighting factors (for 2). All data are provided as tab-separated text files with extensive comments.

Author Contributions
NK designed the project, measured and evaluated the data, and wrote the original draft. KW isolated gymnochrome G and determined its constitution. CG supervised the work and administered the project. All authors discussed the work and contributed to writing. Figure S5: Fraction of bootstrap resamples p that point to a certain configuration (i.e., that configuration has the lowest Q-factor for that resample) for the two different apporaches of data evaluation. Mind the logarithmic scale in the case of 2.  15.44 s 11, 12, 13, 13a, 14 a Spectra were recorded in MeOH-d3 at 800 MHz proton frequency. b These signals show no correlation to protons and could not be individually assigned. They belong to the carbon atoms 10a, 10b, 14b, 14c, 14d, 14e, 14f, and 14g, which can be deduced by comparison to known gymnochromes. [31] These resonances were therefore not used in any data analysis.