Correcting cis-trans-transgressions in macromolecular structure models

Many macromolecular X-ray and cryo-EM structure models deposited in the PDB contain biologically relevant small molecule ligands with unsaturated fatty acid acyl chains, whose cis-trans stereochemistry is incorrect. The molecules are either not properly deﬁned in their stereochemical restraint ﬁles, or the proper stereochemistry is neglected during model building. Often, the same molecules appear in deposited models in both isomeric conﬁgurations, one of which is almost always incorrect, and the use of the same moiety (HET) identiﬁer and restraint ﬁles in model reﬁnement is wrong. We present case studies of frequently occurring molecules and a compilation of identiﬁed cases of C-C = C-C cis-trans geometry in the deposited structure models. Full listings of cis/trans torsion angles are provided for models with commonly occurring molecules to assist identiﬁcation and correction of cis-trans errors and prevent inadvertent use of incorrect models. Caveats for


Introduction
Alkene cis-trans isomers in crystallographic and cryo-EM models

Structure models
Models of biomolecular structures deposited in the PDB [1] are the primary source for 3-D conformation and stereochemistry of proteins and nucleic acids. The structures are predominantly determined by X-ray crystallography, nuclear magnetic resonance spectroscopy or cryo-EM, and in all cases, during reciprocal or real space model refinement, the models require application of stereochemical restraints [2] to assure compliance with known stereochemistry. Owing to the limited number of different residues or nucleotides as building blocks of the proteins and nucleic acids, the macromolecular components of the models are subject to reasonably reliable validation, which reduces severe geometry errors and assures plausible stereochemistry of the macromolecular component of the models [3][4][5]. The macromolecular geometry restraint target values and variance have been generated from accurate stereochemical data in the CSD [6] and assembled into dictionaries for proteins [7,8] and nucleic acids [9][10][11].

Small molecule ligands
Most macromolecular structure models contain additional small molecule components (over 25 000 currently exist in the PDB monomer (HET) library), and for each of them, a correct stereochemical restraint file is required for model building, refinement and validation [12,13]. The validation process of the small molecule model stereochemistry against the expectation values in the restraint files depends on the accurate description and interpretation of each individual molecular entity, and errors in ligand models frequently go undetected, cf. reviews by [14,15]. A surprisingly frequent problem is the violation of cistrans stereochemistry in C-C=C-C alkene moieties.

Relevance
Carbon-carbon alkene double bonds are present in numerous naturally occurring small molecule compounds such as unsaturated fatty acids, lipids or related compounds which have either (predominantly) cis-or trans geometry [16]. Typical examples are unsaturated fatty acids such as palmitoleic acid (PAM) important as a physiologically relevant substrate in cellular signalling [17][18][19], or mono-olein (OLB, OLC) used in LCP crystallization of membrane proteins [20]. Membrane proteins are also frequently crystallized with specifically designed MAGs and other lipids to protect their hydrophobic transmembrane domains. Lipid components that are present either natively or added during purification of membrane molecule complexes are frequently modelled in both X-ray-and cryo-EM-structure models.

Consequences
The energy barrier for the cis-trans isomerisation in alkenes is substantial. Rotation around the C=C double bond of an alkene requires breaking of the double p-bond of about 240 kJÁmol À1 bond energy [21], and in vivo, cis-trans isomerisation generally is catalysed by a respective cis-trans isomerase. The refinement programs do not automatically remedy incorrect cis-trans assignment (cf. Discussion). Without proper restraints and given ambiguous experimental electron density or electrostatic potential maps, the ligand models can become severely distorted during refinement.
Given that cis-trans isomers have different stereochemistry and thus distinctly different physical properties [22], the following points should be observed: afor each cis-or trans isomer, a unique three-letter hetero-monomer identifier (HET) needs to be associated with the monomer restraint file (HET.cif) completely and correctly defining the proper isomer; bthe same three-letter HET identifier cannot be used to describe both isomers; ca given HET isomer can only be modelled in its assigned cis or trans configuration, not in both; dvalidation programs should clearly flag deviations from the expected cis-or trans geometry to allow the depositor to intercept the error already prior to or during deposition, or at least inform the user in the final validation report that a seriously improbable configuration is present.
Examination of models containing commonly occurring HET moieties with alkene double bonds has revealed problems in the above categories (a-d) ranging from severe errors potentially affecting biological conclusions to inconsistent presentation in search results and nuisance errors in modelling of these HET ligands.

Cis-trans transgressions
Out of 24 743 small molecule HET monomers in the PDB, 187 were cis-only, 286 trans-only and 46 monomers contained both cis-and trans bonds. We examined the most frequent molecules, such as unsaturated fatty acids, unsaturated MAGs and related entities. Nomenclature and correct cis-and trans configuration were cross-checked for consistency with PubChem [23].
Free PAM is frequently presented in cryo-EM models with almost random torsion angles to predominantly trans-like orientations (6ucb) [25] (cf. Data Availability). Even when modelled only as a component of a fatty acid membrane collar, completely implausiblethat is, plainly wrong-configurations of double bonds could be avoided by use of proper restraints in model refinement (cf. Discussion).

OLA, ELA
Oleic acid (OLA), (9Z)-octadec-9-enoic acid, is found in 230 PDB entries, and of 470 molecules where the torsion angle could be calculated (in some instances, only truncated fragments are present), about half have implausible torsion angles ( Fig. 2A). There are significant differences between individual entries, with some having all cis torsions correctly assigned, while others show varying cis/trans mixtures up to almost exclusively trans configurations (cf. Data Availability). Its trans-stereoisomer, elaidic acid (ELA), (9E)-octadec-9enoic acid, is modelled in only 18 instances, of which two thirds have implausible torsion angles (Fig. 2B).

VCA
Vaccenic acid is inconsistently annotated in the PDBe as E and Z isoforms (cf. section Chemical library presentation). Out of 21 instances in multiple entries, 20 are modelled as the rare cis-vaccenic acid (asclepic acid, (11Z)-octadec-11-enoic acid) while one is modelled with an implausible torsion angle of À58°.
In contrast to the almost 50% incorrect assignments for oleic acid (OLA), MAGs in general seem to be modelled less frequently with incorrect configuration. The distributions for both mono-olein (9Z) enantiomers are almost bimodal, with about 35% OLB and 23% OLC in implausible, largely trans, configuration (Fig. 2C,D).

S1P
Another biologically relevant trans-monoacylglycerate derivative is sphingosine 1-phosphate, S1P, modelled in 5ksi and 5ksj [30]. In multiple instances, the same molecule is present in split conformations, where each partial model except one violates the expected trans configuration and, notably, with scarce electron  density support even for the few atoms of the additional S1P fragments modelled. No battery of sophisticated nonparsimonious modelling and poor electron density fit can justify these-easily correctable-configuration errors. In two other PDB entries, S1P is modelled correctly (cf. Data Availability).  The general difficulty to distinguish fatty acid acyl chains from polyethylene glycol chains based on electron density alone (which is likely a contributing factor to incorrect modelling) has been pointed out earlier [31,32] and needs to be addressed separately.

Multiple isomers-carotenoids
The class of carotenoids with their multiple conjugated isoprene double-bond chains present a separate challenge for modelling and validation. The core of the conjugated diene-CH=CH-CH=CH-bonds is almost planar due to electron delocalization, with the central bond length of $ 1. 44 A intermediate between a single and double C-C bond.
Although b-carotene all-(E) isomers are the most abundant, some (Z) isomers such as (9Z), (3Z) or (15Z) bcarotene are found in nature [16]. Note that in the s-cis nomenclature for all-(E) b-carotene (BCR), the cis prefix here refers to the C6-C7 single bond between the ring and the isoprene chain and not to any double bonds of the bcarotene chain [33]. BCR is modelled in 144 X-ray and cryo-EM (membrane) protein structures as an integral component. While identification of each (legitimate) isomer with an individual HET might be considered impractical or overly cumbersome during modelling, each isomer is a different chemical entity with different physical properties, and as pointed out, individual HETs are required for consistent elimination of the ambiguities in cis-trans isomer modelling, refinement and validation.
About 2% of the~12 000 double bonds identified in BCR molecules in the PDB are modelled in incorrect cis configurations and the torsion angle distribution shows an extended tail of trans-torsions with improbable torsion angles (Fig. 3C,D). In almost all cases, the molecules disproportionally contributing to aberrant torsion angles are found in regions of poor to no electron density (X-ray) or electrostatic potential (cryo-EM) maps. Particularly, in these situations with weak experimental evidence, correct interpretation and application of the restraints by the modelling and refinement programs are required to keep the stereochemistry plausible. Similar observations hold for related carotenoids such as the plant xanthophylls lutein (LUT) and violaxanthin (XAT).

Chemical library presentation
A source of confusion for model builders and users that could be easily corrected during the ongoing remediation of the PDB HET libraries is the inconsistent annotation of HET compounds and their 2D presentation in the PDB chemical compound libraries (PDBeChem, Fig. 4). As an example, for 1EX ((11Z,13Z)-hexadeca-  11,13-dien-1-ol) the SMILES and IUPAC name match, but contradict the PDBe images, while for ELD (elaidoylamide, (9E)-octadec-9-enamide), the correct IUPAC molecule name contradicts the cis-SMILES but matches the PDBe image; for ELA, the corresponding elaidic acid, all records match. Similarly, in the case of VCA (Fig. 4), the synonym name for the predominant trans-vaccenic acid, (11E)-octadec-11-enoic acid, contradicts the cis-SMILES and image for (11Z)-octadec-11-enoic acid, which could be contributing to all VCA molecules in PDB deposits being modelled in the cis configuration, although cis-vaccenic acid (asclepic acid) is the rare isomer.
In addition, discrepancies exist between the PDBeChem and RCSB ligand summary. 1EX is correct in RCSB PDB, the VCA 2D representation is correct in RCSB, but the 3D presentation is wrong, and the SMILES are unspecified. ELD SMILES are unspecified in the RCSB ligand summary.
The inconsistent chemical nomenclature in restraint files and description was highlighted [34], and remediation announced [35], almost a decade ago.

Cis-trans annotation
Part of the confusion in presentation, modelling and restraint application originates from the inconsistent and hard to parse content in cif (crystallographic information file) format of unremediated legacy restraint files, which are then used by the refinement programs to set up the stereochemical restraints [36,37]. Cis-trans stereochemistry can be annotated in multiple places in the HET.cif files. In addition to the SMILES strings [38], double bonds are identified as cis (Z, 'zusammen') or trans (E, 'entgegen') in the _chem_comp_bond section. Also the _pdbe_chem_-comp_bond_depiction section carries-although not necessarily consistently-cis and trans information in the ENDUPRIGHT/ENDDOWNRIGHT annotation. An example of inconsistent annotation is SPH ((2R,3S)sphingosine), where the SMILES and 'Z' double-bond annotation contradict the actual trans stereochemistry, which could be a contributing factor to~75% incorrect modelling (Fig. 3A). However, the (2S,3R) enantiomer SQS is described correctly.

Restraint implementation
Restraining the cis-trans configurations The observed violations of cis-trans stereochemistry fall into two categories: (a) selection of the wrong stereoisomer while still maintaining planarity, or (b) completely incorrect stereochemistry where not even the planarity of the double bond resulting from the C sp 2 hybridization is preserved. Refinement programs, except REFMAC5 which minimizes the eigenvalue of the product moment matrix of the atoms in the plane [39], generally restrain the double-bond torsion angles by minimizing the normal distance residual of the four carbon atoms from their least-squares plane (Fig. 5), which automatically implies a perfect 0°or 180°target value. The planarity restraint can thus be fulfilled in either, cis-or trans configuration. If the model represents the incorrect isomer, the refinement programs will not flip the incorrect configuration that is trapped in a local minimum. Note that the deviation from the arbitrary LSQ plane is only a precision measure because the true location of that plane is not known.
Currently, the outcome of a rerefinement of a PDB model depends both on the choice of restraint  implementation and on the restraint file version. As an example, we re-refined the PDB entry 4f0a (91.8°s for PAM) with default settings in REFMAC5 [39] using the still distributed legacy PDB restraint file (resulting in s of À105.0°). Using the Grade [40] restraint file that will be provided as a part of ongoing PDB remediation, resulted in correct cis configuration with s of 0.25°. Refinement with PHENIX [41] in default mode (three macrocycles) using ReadySet!/eLBOW restraints results in a correct s angle of 1.01°.

Flipping the double bond
Because the planarity restraints trap a 1-2=3-4 bond in a local minimum around either 0°or 180°, an additional restraint that exacts a penalty for incorrect isomer assignment is required. Additional tight 1-2-4 and 1-3-4 pseudo-angle restraints across the double bond, which differ for cis and trans, with a tight target e.s.d. of 1.0°a re effective and suffice to flip incorrect models but are not enough to assure adequate planarity ( Table 1). The addition of these pseudo-bond-angle restraints to the standard Grade restraint file (or successive application) is surprisingly robust both in refinement and in regularization. Table 1 summarizes the results of REFMAC5 restrained maximum-likelihood refinement of 5wiu [42] which contained a mixture of variably incorrect modelled OLA molecules.
The flipping restraints can also be applied before final refinement in idealization mode as shown in Table 3 in the Materials and methods section for 6peq. Both the sequential regularization and regularization with the modified Grade restraints flip all isomers into the correct cis configuration.

Recommendations
Examination of PDB entries and PDB validation reports revealed numerous problems largely resulting from neglecting the rules listed above, in section categories (a-d), ranging from severe errors potentially affecting biological conclusions to inconsistent presentation in search results and nuisance mistakes in modelling of these HETs. While most of these instances may be quite easily remediated or corrected by the depositors or the PDB, at present the user of such entries must validate the corresponding HETs in the context of the supported claim or intended use of the structure models.

Users
Trust but verify [14]. When a structure model including a ligand or compound is of interest or importance for a user, it is not sufficient to simply download the model from the PDB and to consult the validation report. Global metrics generally cannot inform about local deviations in a small part of the model such as the key ligand. The validation results for the ligand are located at the end of the validation report, and not all errors, particularly C-C=C-C cis-trans violations, are flagged. Inspection of the primary evidence in form of electron density (for example, with the 'Fetch PDB and map' options in COOT [43]) is highly recommended, because then with a Table 1. Rerefinement of an X-ray structure model with flipping restraints. REFMAC5 refinement with the legacy cif fails to improve the geometry, except in one instance, while refinement with the Grade cif fails to correct two isomers. The flipping cif without planarity restraints suffices to flip the isomers, but additional planarity restraints are required to tighten the geometry towards target values. Including pseudo-angle (flipping) restraints with Grade (or successive application of Grade after flipping restraints) corrects all bond torsions. Relaxing the planarity restraint target SD from 0.02 A to 0.03 A (last column) allows to adjust the distribution to expected target value variance while the pseudo-angle restraints concurrently flip incorrect isomers. Colour code: red, more than 40°off target, yellow 40-10°off target, green within 10°off target.
OLA C8-C9=C10-C11 torsion angles (°) in 5wiu, 20  n/a n/a n/a n/a 1. single click the fit of the model to the electron density (or its absence) can be displayed, the double bond inspected, and its torsion angle measured with the distance tool. Comparison of the model with the electron density (or its absence) often provides direct clues why a torsion angle is incorrectly modelled [15,44]. In case of discrepancy, the HET.cif file (links are provided in the PDB ChemExplorer download pages) should be examined for consistency with the expected stereochemistry. For cryo-EM structures, the electrostatic potential maps need to be downloaded separately, but the inspection process outlined above remains the same. Other options for graphical display programs exist, such as Chimera or PyMol [45,46].

Model builders
Be aware that the model refinement programs-both in reciprocal space for crystallography and in real space for cryo-EM-do not automatically fix incorrectly assigned cis-trans stereoisomers. If specified (which is not always the case), only planarity restraints are applied (cf. Fig. 3). The initial model then must be placed with proper stereochemistry. In case of doubt, consult the HET.cif file via the PDB ChemExplorer pages but be aware that inconsistencies such as shown in Fig. 4 do exist. Not every spurious map density needs to be filled with a contorted cis-trans ligand, and in reciprocal space refinement, a poorly placed ligand actually makes the entire model worse [44]. Please keep in mind that: athe same three-letter HET identifier cannot be used to describe both cis-trans isomers-one of them is simply wrong bin addition, if chiral centres are present, assure the proper assignment of diastereomers (example OLB vs OLC) cbefore inventing new HETs, check the HET libraries for existing equivalent entries dUse the latest, updated PDB restraint file from the ongoing restraint file remediation effort, or a ligand restraint generation program such as phenix.elbow [47] or the Grade Server (http://grade.globalphasing. org) to produce a complete restraint file that includes adequate planarity information and check for proper isomer description. If desired, add the respective pseudo-angle restraints for flipping the configuration of incorrectly modelled isomers.
Even if a given small molecule ligand is not central to the research of the model builder, it should be modelled correctly. Later studies or large-scale data mining with a different focus may rely on that ligand model. Proper restraint files are easy to obtain and massively improve the plausibility of the model.

Validation
The current ligand validation reports do not flag incorrect cis-trans isomer assignment as an error. Validation should clearly flag such transgressions and intercept them preferably already during the deposition process, where they can be easily corrected by the depositor. To correct a deposited model is much harder and requires a major remediation effort. Voluntary participation of the original depositors should be considered, by notifying them and encouraging redeposition of corrected models refined with the latest, remediated restraint files. Redeposition should become straightforward with the forthcoming improvements of the current versioning system [48] of PDB entries. In cases where major scientific claims would be affected, voluntary compliance is less likely, and correction of the public record by appropriate action [15] might be necessary. The restraint file modification we have described here are easy to automate and could be implemented in future remediation efforts until a generally accepted modification to the restraint files is implemented.
Inconsistencies in ChemExplorer as shown in Fig. 4, resulting probably from ambiguous, incomplete or contradictory information in the HET.cif restraint files should be addressed in future ligand validation efforts [12].

Data scraping and analysis
The HET.cif restraint files available in the ccp4 7.1 monomer library (Winn et al., 2011) were scraped using SMILES strings [49] C/CC\C or C\CC/C, and C/CC/C or C\CC\C for any presence of cis and trans geometry, respectively. Out of 24743 HET monomers in the January 2021 ccp4 7.1 monomer library, 187 were cis-only, 286 trans-only and 46 monomers contained both cis-and trans C-C=C-C bonds. For each HET serving as an example, the HET cif files were scanned for double bonds, their location and status determined, and the torsion angles calculated (cf. Data Availability).

Estimate of target value variance
The CONQUEST program from the CSD suite of crystallographic software [50] was used to extract geometry data for unsubstituted alkenes. We searched for X-CH2-CH=CH-CH2-X motifs, where X is any atom and none of the carbons is part of a ring system. Powder diffraction structures, as well as structures with an R-value higher than 0.075, and structures containing atoms heavier than Ca were excluded. We obtained 152 hits and queried the torsion angle, the 1-2-4 and the 1-3-4 pseudo-bond (flipping) angle, and the 1-4 distance. Data were separated into cis (torsion < 90°) and trans (torsion > 90°) configurations. For the analysis of torsion angles, all hits with a torsion angle of exactly 0°or exactly 180°were excluded since those exact values occur when the torsion is crystallographically constrained. The resulting torsion distributions are depicted in the background of Figure 1 (blue and orange histograms).
Outlier detection using a modified Z-score test [11,51] was applied and variances from restraint target values for all nonoutlier data points were calculated and summarized in Table 2.
Example for creating a flipping restraint file Edit the Grade restraint file for PAM, insert pseudo-angle flipping restraint and optionally, relax planarity as necessary to avoid over-restraining (0.03 A as a planarity target SD for~2.0 A resolution data, in this example).  n/a n/a n/a n/a 0.   Table 3 provides the result of the regularization via flipping restraints for a cryo-EM model.

Example for application of flipping restraints for regularization
Alternate flipping procedure with 1-4 distance restraint As an alternate approach, the use of a single 1-4 distance restraint (3.09 A for cis, 3.92 A for trans) was explored. Starting from incorrect trans configurations, the 1-4 distance restraint works primarily against the other bond restraints while achieving little towards flipping the torsion angle. While successive application of this single restraint and then standard Grade restraints can achieve correct results, singlestep refinement with a 1-4 distance restraint combined with planarity restraints did not result in correct cis-trans configurations. The two pseudo-angle restraints accomplish this readily, but for the single 1-4 distance restraint we were unable to find a reliable single-step weighting scheme.
Another possibly preferable configuration-specific restraint would be a hard torsion angle restraint with a period of 1 with an empirical variance. Such torsion restraints are currently provided with target of 180°, a degenerate periodicity of 2 and practically infinite variance and thus not implemented. With proper implementation, unique torsion angles could be an alternative to the cis-transdegenerate planarity restraints.