Non‐Stationary Complementary Non‐Uniform Sampling (NOSCO NUS) for Fast Acquisition of Serial 2D NMR Titration Data

Abstract NMR spectroscopy offers unique benefits for ligand binding studies on isotopically labelled target proteins. These benefits include atomic resolution, direct distinction of binding sites and modes, a lowest detectable affinity limit, and function independent setup. Yet, retracing protein signal assignments from apo to holo states to derive exact dissociation constants and chemical shift perturbation amplitudes (for ligand docking and structure‐based optimization) requires lengthy titration series of 2D heteronuclear correlation spectra at variable ligand concentration that may exceed the protein's lifetime and available spectrometer time. We present a novel method to overcome this critical limitation, based on non‐stationary complementary non‐uniform sampling (NOSCO NUS) combined with a robust particle swarm optimization algorithm. We illustrate its potential in two challenging studies with very distinct protein sizes and binding affinities, showing that NOSCO NUS can reduce measurement times by an order of magnitude to make such highly informative NMR titration studies more broadly feasible.


Experimental details
All NMR measurements for the SH3/p41 system were performed on a 700 MHz Agilent Di-rectDrive2 spectrometer equipped with a room-temperature HCN probe at 25°C. A 25 µM solution of 15 N α-Spectrin domain SH3 in 10 mM sodium citrate buffer (pH=3.5) was purchased from Giotto Biotech (Sesto Fiorentino, Italy). The ligand decapeptide p41 1,2 with amino acid sequence APSYSPPPPP was purchased from Lipopharm.pl (Poland). A stock solution of p41 peptide was obtained by dissolving 4.8 mg of the peptide in 200 µl of a citrate buffer (pH = 3.5). For the PCNA/p12 system, all 2D 1 H-15 N TROSY spectra were recorded on an 800 MHz Bruker AVANCE III spectrometer at 35°C, as previously described. 3 The titration series comprises the reference spectrum (no ligand) and 6 with different values of x: 0.1, 0.25, 0.5, 2, 5, 10. Each experiment required 22 hours NMR time.  13 C, 15 N] labeled PCNA (lower panel). The SH3 spectrum shows significantly higher spectral sparsity, which can be defined as a number of "significant spectral points" (points contributing to peaks) in a 15 N dimension. The comparison explains why severe undersampling works better for the SH3 spectrum than for the PCNA spectrum.

Standard NUS reconstruction
Alternatively to NOSCO processing, one can perform a separate compressed sensing processing (NUS reconstruction) of the data collected for each titration point. However, the sampling level that is sufficient for NOSCO (e.g. 12.5% for SH3 data), might be too low when performing regular CS processing. Figure S2 shows four peaks from a series of SH3 spectra reconstructed separately from the same data as used for NOSCO. While CS performs well in the sparse spectral region (peak W41Ne-He), it would require far more sampling points to reconstruct other peaks. Problems manifest themselves in corrupted lineshapes and peak positions, up to complete disappearance of peaks.
The situation is even worse in the case of PCNA system, where obtained TROSY spectrum is much less sparse. Figure S3 shows an attempt to reconstruct one of the spectra in a titration series from the amount of data used for NOSCO (∼ 17%). The reconstruction fails completely and all shifting peaks are missing.
In both cases the reconstruction has been performed using CS module of the mddnmr program (IRLS algorithm, 20 iterations). 4 Figure S2: Selected peaks in 2D 1 H− 15 N HSQC spectra of [U-15 N] labeled SH3 reconstructed separately from the complementary NUS data (32 points out of 256 point grid). Eight titration points (growing concentration of the ligand) correspond to colors: beige, blue, coral, lime green, purple, green, orange, dark blue. Only W41Ne-He peak belonging to the sparsest spectral region is well reconstructed using compressed sensing. Others reveal disturbed lineshapes, positions or are completely missing. Figure S3: Fully sampled TROSY spectrum of the first titration point for PCNA (a) and an attempt to reconstruct this spectrum using compressed sensing (b) from 21 NUS points out of 128-point grid (the same as used for NOSCO processing). Besides sparse region in the left part of the spectrum, the reconstruction failed completely and all shifting peaks are missing.

NOSCO spectra
Figures S4 and S5 and illustrate the performance of NOSCO processing in refocusing peaks that appear smeared by binding induced CSP in the co-processed 2D projection spectrum from a pseudo 3D titration series sampled with NOSCO NUS. For the well resolved SH3 spectrum, NOSCO refocused and reference signals are virtually identical. In contrast, the co-processed PCNA titration series spectrum contains stronger t 1 -noise artifacts resulting from non-stationarity 5 due to intrinsically higher spectral crowdedness and lower signal-tonoise ratio. While this leads to a generally poorer match between NOSCO refocused and reference peaks (which often show partial overlap), the derived K D values still agree precisely with those obtained by the conventional method. Similar excellent agreement is observed for the CSP amplitudes derived by both methods, confirming that NOSCO processing can also reliably quantify these critical structural parameters for advanced docking methods. 6,7 Figure S4: (Top) Superimposed 2D 1 H-15 N HSQC spectra of ligand-free SH3 (black, reference) and from co-processing of the NOSCO sampled titration series with p41 (red) where indicated signals appear smeared by binding induced CSP. (Bottom) Corresponding spectral regions containing a single smeared peak (red). Individual NOSCO processing achieves its refocusing (green) onto the corresponding reference signal (black) by adjusting the CSP amplitudes (∆ν H , ∆ν N ) and K D value. Figure S5: (Top) Superimposed 2D 1 H-15 N HSQC spectra of ligand-free PCNA (black, reference) and from coprocessing of the NOSCO sampled titration series with p12 (red) where indicated signals appear smeared by binding induced CSP. (Bottom) Corresponding spectral regions containing a single smeared peak (red). Individual NOSCO processing achieves its refocusing (green) onto the corresponding reference signal (black) by adjusting the CSP amplitudes (∆ν H , ∆ν N ) and K D value.

Particle swarm algorithm
Particle swarm optimization algorithm (PSO) 8 belongs to so-called evolutionary algorithms.
PSO efficiently searches for a global extreme and can deal with functions with many local extremes. A sufficient number of particles (points of the function domain) allows avoiding the fall into a local extreme. PSO is also suitable for non-smooth functions as it does not use the gradient of the sought-for function. The algorithm is initialized with picking up random points, or particles, within the function domain (the user-provided constraints). For each of these candidate solutions to the problem, the function value is calculated. At each iteration step, the positions of all particles are updated, with a stochastic and a deterministic component. The particle "velocities" are calculated as follows: where i is the number of the iteration, j is the index of the particle, x is the particle position in the multidimensional space of search, v is its velocity, w, c 1 and c 2 are hyperparameters of the algorithm ("inertia" and "acceleration" factors), r 1 and r 2 are random numbers between 0 and 1, xBest and gBest are the best particle position and the best group position. Thus, some kind of group behaviour also governs the displacement of every particle. In this work, we adjusted the MATLAB code provided by L.A. Kareem 9 and tuned the hyperparameters.
Here, particle velocities are reversed if particles attempt going out of the search space defined by the user. Moreover the algorithm performs several runs with independent initializations and selects the best result. This is desirable because of the semi-stochastic nature of the algorithm.
A brute force search for the set of parameters {∆ν N , ∆ν H , K D } that minimize Eq. 4 (in the main text) requires the pre-definition of search boundaries. Boundaries for ∆ν values can be estimated by looking at the size of "smeared" peaks in the complementary NUS spectrum (see Figure 1 in main text). Yet, an underestimation results if the CSP do not reach their plateau (saturation) value by the end of the titration, which is the case for both systems studied in this work, and more severely so for the SH3/p41 system. The problem can be alleviated by reasonable up-scaling of the estimations, suggesting CSP amplitudes of 0.5 ppm and 3 ppm in the 1 H and 15 N dimensions, respectively (in agreement with typical ∆ν values 6 ). NOSCO automatically sets boundary values for K D using Eq. 2 (in the main text). The lower boundary is given by the K D value that results in a 1% difference for the CSP values of the last two titration points in the series. The upper boundary is then set three orders of magnitude higher. Such a wide range of K D values might seem exaggerated, but helps to avoid incorrect results without an excessive increase in computation time.

Working principle of NOSCO: animation
As further SI material we include an animation illustrating how NOSCO works. This simulation was performed with hyperparameters for Particle Swarm Optimization different from those described in the main text.
The left-hand panel shows the 3-dimensional parameter space in which the PSO search is carried out. The entire sequence corresponds to a single PSO run. The red particle is the best particle after each iteration. The two grey particles are bound to the ν 1 ( 15 N) vs ν 2 ( 1 H) and ν 1 ( 15 N) vs K D limits and show the projections of the best particle onto these walls. Green lines mark the true parameter values obtained in this search.
The central panel is a surface plot showing how the complementary NUS signal is corrected by multiplication with a correction signal fed with the parameters from the best particle in each iteration.
The right-hand panel shows contour plots of NOSCO sampled, reference, and corrected peaks.
The sub-optimal results obtained in the animation are due to the alteration of PSO hyperparameters (reduced number of particles and convergence). Yet, this example helps to visualize the complexity of the problem, where many possible solutions (local minima) yield very similar results (as seen in this simulation) and only the corrected peak height is satisfactorily reproduced. At the end of the animation, each blue particle not positioned with the bulk of particles is also trapped in a local minimum.

Sampling multidimensional NMR signals
NMR frequency sampling in an indirect time dimension is a lengthy process governed by three fundamental relations: (i) the time increment (∆τ ) between sampling points inversely defines the spectral width (SW = 1/∆τ ) for unaliased frequency detection (Nyquist theorem 10 ), (ii) the longest sampling time (acquisition time AQ) inversely defines the spectral resolution (∆ν = 1/AQ), 11 and (iii) NMR frequencies from spin rotation require pairwise sampling of the real and imaginary parts for each increment. Conventionally, AQ is reached after uniform sampling of all n time increments (AQ = n · ∆τ ), leading to long 2D vs. 1D experiment times, T 2D /T 1D ∼ 2 · n = 2 · (SW/∆ν). Alternatively, AQ may be reached more quickly with fast sampling 12 and non-uniform sampling (NUS) 13 schemes. For typical 2D protein spectra, however, the acceleration factor is limited to between 2 (50% NUS) to 4 (25% NUS), depending on the number of peaks, 14 since each spectrum must have sufficient sensitivity and resolution to allow peak localization. Consequently, even NUS acquisition of a full 2D titration series may still take up to a week and, thus, exceed the protein's (or ligand's) lifetime or available spectrometer time. Alternative sampling and processing methods for serial experiments 15 have not yet been adapted to this application.

Undersampling and NOSCO
Of note, the complementary sampling of titration spectra does not have to cover the complete t 1 grid since NOSCO NUS can be combined with compressed sensing (CS) reconstruction. 16,17 In this case, NOSCO processing first corrects the measured signals for non-stationarity, then reconstructs the missing time-domain data points of a signal with an iterative soft thresholding algorithm 16,18 and, finally, implements the minimization procedure from Eq. (4). This undersampling mode is computationally more demanding and requires the spectrum to be compressible. 14 Figure S6 shows the results of such an approach, averaged over 5 runs using the same sampling schedule. In both cases, a total sampling level of only 12

Signal intensity decay
Signal intensity decay due to a transverse relaxation increasing with the ligand-to-protein ratio x is a commonly observed phenomenon in titration studies. The dependence of the net transverse relaxation rate, R 2 , on x is typically distinct for each residue and cannot be assumed to be linear, complicating its consideration in the correction algorithm. Yet, our results (presented in the article) show that NOSCO processing returns accurate values for K D and CSP that agree excellently with those obtained by the conventional approach, without any need for further adjustments. Signal intensity decay is particularly strong for the PCNA/p12 system studied here, as shown below for all selected peaks. Figure S7: Signal amplitude (in % of the maximum) vs. ligand-to-protein ratio x (log scale) for all selected 10 peaks of the PCNA/p12 system, as derived from the conventionally, fully sampled 2D titration spectra of PCNA. While the signal intensity can fall below 30% of the reference peak intensity, NOSCO still returns accurate results (see main text).

Separate NOSCO runs from Fig. 3 in the main text
In Figure 3 of the main text, we show the results from NOSCO processing of NOSCO-NUS sampled titration data. These results are the average over 20 NOSCO repeats, while error bars represent one standard deviation over these 20 NOSCO repeats. While the repeats evinced the accuracy and reproducibility of NOSCO processing, exact K D values were already obtained after a single run. In each repeat, the same optimized PSO parameters were used (see main text). Figure S8: Results of all 20 NOSCO processing runs (on the same NOSCO-NUS t 1 data matrix) for the SH3/p41 system; the averaged values over all 20 runs are shown in Figure  3A of the main text. Blue dots and red squares represent values from conventional fitting and NOSCO processing, respectively. The green band represents the 95% confidence interval for the average K D obtained by conventional fitting. The indicated K D values are the average (over all selected residues) from the pertaining NOSCO processing run. Figure S9: Results of all 20 NOSCO processing runs (on the same NOSCO-NUS t 1 data matrix) for the PCNA/p12 system; the averaged values over all 20 runs are shown in Figure  3A of the main text. Blue dots and red squares represent values from conventional fitting and NOSCO processing, respectively. The green band represents the 95% confidence interval for the average K D obtained by conventional fitting. The indicated K D values are the average (over all selected residues) from the pertaining NOSCO processing run.

NUS schedules generator
Here we show a short Matlab script to produce stratified NUS schedules. The output files are .txt files ready to be used for measurement in the NMR spectrometer. There are as many files as titration points to be studied (input by user). The NUS schedules are not completely random, but stratified, to ensure a more homogeneous sampling across each schedule.

Experimental setup
The data for NOSCO processing can be collected using default NUS data collection scheme available on spectrometers of all main vendors. The only difference is that sampling schedules in the consecutive spectra should be complementary i.e. generated with the code above.
Thus, after setting up NUS mode, but before running an experiment, the schedule generated by the spectrometer-driving software has to be replaced by the "complementary" one.

Data pre-processing
The data for NOSCO should be preprocessed using nmrPipe. 19 The data has to be converted into the nmrPipe format, processed by applying zero filling, apodization and phasing in both dimensions, as well as solvent suppression. Scripts are available from nmr.cent.uw.edu.pl → Downloads. The NOSCO program compensates the signal intensity changes between NUS points that result from the dilution of a sample. It requires manual setting of the regions of interest for each peak.