Deriving Ligand Orientation in Weak Protein–Ligand Complexes by DEEP‐STD NMR Spectroscopy in the Absence of Protein Chemical‐Shift Assignment

Abstract Differential epitope mapping saturation transfer difference (DEEP‐STD) NMR spectroscopy is a recently developed powerful approach for elucidating the structure and pharmacophore of weak protein–ligand interactions, as it reports key information on the orientation of the ligand and the architecture of the binding pocket.1 The method relies on selective saturation of protein residues in the binding site and the generation of a differential epitope map by observing the ligand, which depicts the nature of the protein residues making contact with the ligand in the bound state. Selective saturation requires knowledge of the chemical‐shift assignment of the protein residues, which can be obtained either experimentally by NMR spectroscopy or predicted from 3D structures. Herein, we propose a simple experimental procedure to expand the DEEP‐STD NMR methodology to protein–ligand cases in which the spectral assignment of the protein is not available. This is achieved by experimentally identifying the chemical shifts of the residues present in binding hot‐spots on the surface of the receptor protein by using 2D NMR experiments combined with a paramagnetic probe.


NMR experiments
All the NMR experiments were conducted in 10 mM Tris-d11 D2O buffer pH 7.8 and 100 mM NaCl at 298 K. For the hot-spot mapping experiment of RgNanH-GH33 with TEMPOL, the protein was concentrated at 1.2 mM using Amicon filters with a 30 kDa cut-off. Three consecutive 2D 1 H-1 H TOCSY experiments were acquired in the absence and in the presence of 2 mM and 12 mM of TEMPOL, respectively. The 1 H-1 H TOCSY [3] NMR spectra were acquired on an 800.23 MHz Avance III Bruker spectrometer at 298 K with an SW of 12 ppm acquiring 32 scans per experiment with a TD of 2K data points in the direct dimension and 256 experiments in the indirect dimension. The residual water was suppressed by using the WATERGATE technique. [4,5] The recovery time for the experiment was 2 s while the mixing time was set typically at 0.1 s. 1D 1 H STD NMR [6] spectra were acquired with SW of 16 ppm using a TD of 32K data points and a recycling delay (D1) of 3 s with 64 scans. The Bruker library pulse stdiffesgp.3 was used for all the STD NMR experiments applying a train of 50 ms Gaussian shaped pulse (0.1 mW [40 dB]) at 40 ppm (off-resonance spectra) and 0.6, 0.74, 1.06, 1.15, 1.26, 6.6, 6.74, 7.04, 7.57, 8.56 ppm (on-resonance spectra), to get the averaged differential epitope (DEEP-STD NMR). For all the experiments, a saturation time of 0.75 s was used. The water signal was suppressed by the excitation sculpting technique [7] while the protein signals were removed by using a 40 ms T1ρ filter.

Calculations of DEEP-STD NMR factors
In DEEP-STD NMR, [8] two STD NMR experiments are performed on the same sample (exp1 and exp2), each differing only in the frequency of the saturating pulse. For example, exp1 may irradiate aromatic groups (7 ppm) and exp2 may irradiate aliphatic groups (1 ppm). For each proton, i, in the spectrum the DEEP STD NMR factor (ΔSTDi) can be calculated by taking the ratio between the STD intensities of that proton in exp1 (STDexp1,i) and in exp2 (STDexp2,i) and subtracting from it the average ratio between the two experiments: Averaged experimental DEEP-STD NMR processing protocol.
In this work, we slightly modified the previously published DEEP-STD processing protocol [8] by introducing the concept of "averaged DEEP-STD maps", which are obtained from a series of different saturating frequencies (i.e., the set of frequencies derived from the 2D 1 H, 1 H-TOCSY TEMPOL experiments). In this work, 25 individual DEEP-STD maps were obtained. They were: DEEP-STD maps 1-5:  After that they were averaged for each proton to give a unique profile as a combination of all the frequencies. (See Fig. 3a in the main text). A simple averaging of the DEEP-STD factors for each ligand proton was performed. The rationale behind averaging of the DEEP-STD factors is to consider the global (accumulated) saturation effect that any ligand proton is experiencing from different residue regions around it that are selectively saturated in different experiments when moving the saturation frequency. That is, when only a single frequency is used for, for example, aromatic irradiation, it might be that, under the experimental conditions, the aromatic region surrounding the ligand in the binding pocket is not well saturated, reducing the "quality" of the differential information that can be gained by the DEEP-STD NMR methodology. This can be improved by saturating at different frequencies around that spectral region (e.g., 6.60, 6.74, 7.04, 7.57, 8.56 ppm, at the aromatic spectral region, Fig. S3), and then calculating the average value. Additionally, in this way, we are also taking into consideration the dynamics of ligand ligands protons in the bound state which, in the timeframe of STD measurement, might visit slightly different sites and have slightly different orientations. Calculating the average leads to the best representation of the most populated binding orientation.

Averaged theoretical DEEP-STD data.
In order to evaluate the experimental results, CORCEMA-ST was used to calculate theoretical NMR-STD intensities following similar protocols already published. [9] Briefly the calculations were performed using the X-Ray structure of the RgNanH-GH33 protein in complex with 2,7-anhydro-sialic acid (PDB code 4x4a). The saturation regions for the protein (irrad option in the inputdata.m in CORCEMA-ST) considered for the calculation were 0.0-0.6 ppm; 0-0.75 ppm; 0-1.0 ppm for the aliphatic regions and 5.6-7.5 ppm; 5.5-8.0 ppm for aromatic using values predicted from shiftx2. The outputs of CORCEMA-ST for each saturation region were first were analyzed for their consistency with the experimental STD-NMR data and further used to calculate the averaged DEEP-STD factors in a similar fashion to the experimentally derived ones. (See main text Fig. 3b). The maps obtained in this case are fewer due to the fact that ranges were used that best matched the experimental intensities (Fig. S4). The maps shown in the figure show the best correspondence with the experimental data. After that they were averaged to give a unique profile as a combination of all the frequencies. (See main text Fig. 3b)

Molecular Dynamics simulations
The coordinates of RgNanH-GH33 were obtained from RCSB PDB protein code 4X4A and prepared using the Protein Preparation Wizard of Schrodinger's Maestro. [10] All non-protein atoms were removed. Protons were then added to the model, using PROPKA to predict the protonation state of polar sidechains at pH 7. [11] The hydrogen-bonding network was automatically optimized by allowing asparagine, glutamine and histidine sidechains to be flipped. The model was then minimized using the OPLS3 [12] force field and a heavy atom convergence threshold of 0.3 Å.
A 3D model of the TEMPOL structure and AMBER-compatible parameters were obtained. [13] For the classical MD simulations, a total of 10 TEMPOL molecules were randomly positioned around the protein. For MixMD, Packmol [14] was used to generate a solvent box of 50% w/w TEMPOL in TIP3P water. The system was then built with the leap module of AmberTools [15] using the AMBER ff14SB [16] forcefield for RgNanH-GH33. The protein (and 10 TEMPOL molecules for the classical simulations) was solvated in a truncated octahedral bounding box, set such that the protein was at least 10 Å from the box edge in all places. For the classical simulations, the solvent was TIP3P water, whereas, for the MixMD simulation, the solvent was the 50% w/w TEMPOL mixture in TIP3P water. In all cases, 16 Na + ions were added to neutralize the system.
Each system was minimized using the conjugate gradient algorithm, converging on a threshold of 10 -4 kcal mol -1 Å -1 , first with 20 kcal mol -1 Å -2 restraints on solute atoms, before repeating with no restraints. The system was slowly heated to 310 K over 500 ps (NVT), before equilibrating the pressure to 1 atm (NPT) over a further 500 ps. In both cases, 20 kcal mol -1 A -2 restraints were used on solute atoms. These restraints were then slowly released over 800 ps to produce the equilibrated system. In all cases, the SHAKE algorithm was used to restrain all bonds involving hydrogen, allowing for a time step of 2 fs. A Langevin thermostat was used with a collision frequency of 5 ps -1 and the barostat used an isotropic Berendsen algorithm with a relaxation time of 1 ps. In all cases, periodic boundary conditions were used, using the particle mesh Ewald to calculate electrostatics. The timestep of 1 fs was required to maintain a stable simulation in the presence of TEMPOL.
In all cases, production dynamics were run using the same parameters described for system equilibration. For the long classical MD simulation, the total simulation time was 1 μs. For the replicas, 16 independent 50 ns simulations were performed by initiating each simulation with different randomly generated velocities. For the MixMD, 12 independent 20 ns simulations were performed.