Example 1. Structure elucidation from extremely contradictory 2D NMR data
Kummerlöwe and co-workers[12] investigated one of the products obtained by reacting an azide-containing 1,5-enyne in the presence of electrophilic iodine sources. Initially, the researchers tried to elucidate the structure of this new compound by using classical methods commonly employed in such cases. High resolution mass spectrometry unambiguously provided the molecular formula for the unknown: C16H18NI, m/z = 351.0486 [351.0484 calculated for C16H18NI (M+)]. The following spectroscopy data were acquired at the first stage of the investigation: IR spectrum, 1D 1H and 13C spectra in combination with two-dimensional COSY, HSQC, 1H–13C HMBC and 1H–15N HMBC experiments. Eleven fragments were identified from the data: a phenyl group, a methyl group, five methylene groups (three forming an isolated chain), a tertiary nitrogen atom, an iodine atom and four quaternary carbon atoms. The 1H–13C HMBC spectrum revealed 63 long-range correlations and the 1H–15N HMBC spectrum exposed seven cross peaks, thereby correlating almost every fragment with every other fragment and indicating a very compact structure. Because it was difficult to deduce the structure from these data, a 2D 1,1-ADEQUATE spectrum[14] was also recorded on a Bruker Avance 900 MHz spectrometer (Bruker Biospin, Rheinstetten, Germany) equipped with a 5-mm cryogenically cooled TXI probehead optimized for proton detection. The 1,1-ADEQUATE data did identify adjacent quaternary carbons unequivocally. Although this was useful information, this additional data did not help to elucidate the structure.
Because classical NMR analysis failed, the authors[12] decided to make an attempt to solve the problem in an unconventional way by using residual dipolar couplings (RDCs).[15] In accordance with the methodology associated with RDC, they assumed that as long as sufficient anisotropic parameters can be measured and a large enough set of structural models can be constructed, it should be possible to identify the correct chemical structure.
To measure the RDCs, the compound was aligned in a stretched polystyrene/chloroform gel. The corresponding scalar couplings were measured in a chloroform solution sample. Fourteen proposed structures, including several models that were unlikely (see Fig. 1), were tested using the experimental data. Analysis of the RDC data suggested that structure #2a is the correct one.
To confirm the structure suggested by the RDC data, almost 100 mg of the reaction product was synthesized, and a 2D INADEQUATE spectrum[14] was acquired using 3 days of spectrometer time. The structure #2a elucidated using the RDC data was unambiguously confirmed by the INADEQUATE data. In addition, labeling the starting material of the reaction with 15N-azide and measuring 13C–15N couplings for the 15N-labeled compound were performed. Both additional experiments clearly supported structure #2a.
Posterior data analysis showed that the 1H–13C HMBC spectra contained nine so-called ‘nonstandard’ correlations (NSCs) (those having nJHC, n > 3).[16] This is not surprising considering that the molecule is a highly rigid system. The CASE program interprets the combinations of the available 1H–1H and 1H–13C correlations to derive carbon–carbon connectivities and to produce nine nonstandard C to C connectivities (see Fig. 2), which was the main cause preventing structure elucidation using a traditional approach. The initial system of ‘axioms’ used for the structure elucidation from HMBC data[1] became extremely contradictory because of the presence of nonstandard connectivities. Moreover, two unexpected intense5JCH cross peaks correlating two protons with the ortho-carbons of the phenyl group (see Fig. 2) were identified in the 1H–13C HMBC spectrum. The corresponding part of the HMBC spectrum is presented in Fig. 3, taken from the supporting information of Kummerlowe et al.[12]
We suggest that this can be explained as a result of the hindered rotation of the phenyl group due to the large volume of the iodine atom.
At the same time, the authors[12] found that structure #2a was almost certainly excluded from the potential set of structures because the 13C chemical shifts predicted by ChemDraw (CambridgeSoft Corp., Massachusetts, USA),[17] and presented in this work, differed significantly from the experimental data (see Fig. 4 where the results obtained by neural net algorithm[11] are shown for comparison). The mean average error was 4.65 ppm with linear regression described by R2 = 0.982, which indeed can be considered as a hint to conclusion that the structure #2a is questionable.
The highly complex nature of the 2D NMR data prompted the authors to conclude that the problem could not be solved by a classical approach. In making this decision, they only considered the NMR data in isolation from algorithmic-assisted approaches such as those available in CASE software such as Structure Elucidator (ACD/Labs Inc., Ontario, Canada).[18] This software package has been applied for over a decade to solve real-world problems. The experimental data presented in the work[12] were therefore analyzed using the software program with several modes of problem solving examined.
Run 1. The molecular formula, 1D 13C, HSQC, 1H–13C HMBC and 1H–15N HMBC spectra were input into the program. All five HMBC peaks marked in[12] as very weak were ignored for the first run to reduce the possible number of NSCs. A Molecular Connectivity Diagram (MCD)[18] was automatically created as shown in Fig. 5.
As a result of the logical analysis of the MCD, the program discovered the presence of NSCs in the HMBC spectrum, which suggested that an approach we have termed ‘Fuzzy Structure Generation’ (FSG) was necessary.[16, 19] It should be noted that the FSG mode freely allows the long-range correlation lengths to be varied to any extent. FSG was run assuming that the HMBC data contain an unknown number of NSCs, each of them being of unknown length.
No assumptions or user interventions were used. As a result of structure generation accompanied by spectral and structural filtration,[18] three possible structures were output in 13 min. 13C and 1H chemical shift predictions using the neural net algorithms[10, 11] incorporated into the Structure Elucidator were performed, and the structural file was then ranked in ascending order of the 13C chemical shift average deviation (Fig. 6). Although 1H NMR prediction is generally a ‘weaker’ nucleus for rank-ordering the structures, in this case it proved to be of value in terms of providing additional confirmation of the structure.
Figure 6 shows that the correct structure #2a was identified as the most probable structure, and its 13C deviation is significantly (almost twice) smaller than that calculated with ChemDraw (see also Fig. 4). The chemical shift assignment for structure #1 (as shown in Fig. 6) suggested by the prediction algorithms fully coincided with that suggested by the authors.[12] The proposed structure #2b (#2 in Fig. 6) was also generated but was declined on the basis of the chemical shift predictions. Structure #3 results as a logical consequence from the experimental data but can be rejected because of the higher chemical shift deviations: Both the 1H and 13C prediction deviations are almost twice the size of those for the first ranked structure, and our experience[9] shows that such large differences remove the structure from consideration.
Run 2. All HMBC correlations, without any exclusion and including the set of nine NSCs, were used, and 1,1-ADEQUATE correlations were also added to the 2D NMR data (see MCD in Fig. 7).
Fuzzy Structure Generation was run with the following result: Only one correct structure, #1 (Fig. 6), was generated in 0.7 s. The application of the CASE approach therefore allowed us to instantly and unambiguously find the single correct structure from the HMBC and 1,1-ADEQUATE data. It has now been shown a number of times[20-23] that 1,1-ADEQUATE data in conjunction with other 2D NMR data is a very valuable data combination as input for CASE programs.
Example 2. Structure elucidation from incomplete 2D NMR data
The second example reviewed in this work was inspired by the article published by Gross and co-worker.[13] They suggested a new method of determining the structures of small planar molecules based on Atomic Force Microscopy (AFM)[24, 25] (It should be noted that the authors interchangeably used the terms AFM and SPM in their article, and we are adhering to the use of AFM only in this article.). This approach would clearly make an excellent adjunct to the other tools available for organic structure analysis, and to validate its utility, they studied the natural product cephalandole A, (1), C16H10N2O2,
which had previously been misassigned by Wu et al.[26] and later corrected by Mason et al.[27]. The authors[13] explain that this compound was selected for testing the AFM method because it meets all three criteria specified previously that render structure analysis especially challenging:[28] The ratio of heavy atoms to protons is ca 2 : 1, and the O and N atoms at positions 1 and 4 respectively interrupt the carbon skeleton completely, separating the two parts of the molecule. In addition, the carbonyl at C2 is distanced from the nearest proton by four bonds and is not expected to show correlations in an HMBC experiment. The molecular formula indicates that there were 13 degrees of unsaturation in the structure.
1H–13C HMBC and very sparse COSY data were used by the authors[13] to elucidate the structure. On the basis of NMR data analysis, the authors suggested four structures consistent with the available data (see Fig. 8). The authors comment that the available NMR data did not allow distinction between a 2- or 3- substituted indole substructure, and therefore, all four structures #1–#4 could be considered plausible. Structure #1 is the accepted structure of cephalandole A, and structure #2 is the previously misassigned structure of this compound.
Gross and co-workers have demonstrated that the AFM approach combined with quantum-chemical computations is really capable of helping to select structure #1 as the most probable one using the analysis of molecular images, and this gives spectroscopists a new independent tool to distinguish planar molecules that may have similar structures.
The problem of elucidating the cephalandole A structure posed by Gross and co-workers was used by us as another challenge for CASE expert systems. The 1D and 2D NMR spectra acquired by the authors[13] to analyze this problem were input into Structure Elucidator and the MCD was created. No user intervention or data corrections were made. Checking of the MCD detected the presence of NSCs in the 2D NMR data, and the FSG mode was therefore employed for the structure elucidation. As a result, the program produced an output file of 11 structures in 1 min 50 s, and structure #1 was selected as the most ideal candidate using, as in the previous example, 13C and 1H NMR chemical shift prediction by the neural network algorithm. Structure #2 (Fig. 8), initially suggested by Wu and co-workers as the correct structure, was also generated and ranked ninth in the file with deviations of dN(13C) = 3.70 ppm; hence, it should definitely be rejected. In our previous review,[1] we have shown that this structure would be immediately declined by the researchers on the basis of the 13C chemical shift deviations calculated with the aid of Structure Elucidator. Figure 9 shows three (of 11) structures that are of similar shape and ranked as first, fourth and ninth.
One of the reviewers of this manuscript commented that relative to Gross et al.'s original statement that ‘the available NMR data did not allow distinction between a 2- or 3- substituted indole substructure’, a 3JCC correlation in a 1,n-ADEQUATE spectrum could link the 2-position of the indole of structure #1 with the carbonyl, assigning the structure, if structures #1 and #2 had been selected based on 1,1-ADEQUATE data. Conversely, a 3JCC correlation in a 1,n-ADEQUATE spectrum could link the 3-position of the indole of structure #3 to the carbonyl if structures #3 and #4 had been selected based on 1,1-ADEQUATE data. Hence, there are viable spectroscopic routes to the identification of the structure in Example 2 parallel to CASE methods. Another possible approach to the assignment of the structure is based on long-range 1H–15N 2D NMR. A 1H–15N HMBC spectrum optimized for ~3 Hz would be expected to give correlations to both 15N resonances in the case of structures #1 or #2. It would be ideal to perform these additional 2D NMR experiments and then feed those data into the CASE program for additional and very strong confirmation of the structure.
It is noteworthy that the proposed structures of #3 and #4 (Fig. 8) were not generated at all. A question arises from the analysis of Fig. 9: Because the generated structure #4 (Fig. 9) is more preferable than structure #9 (suggested by Wu et al.) and has the geometrical configuration similar to correct structure, is the AFM method capable of distinguishing between structures #1 and #4 shown in Fig. 9?