Bayesian optimization of gradient trajectory for parallel‐transmit pulse design

Spoke pulses improve excitation homogeneity in parallel‐transmit MRI. We propose an efficient global optimization algorithm, Bayesian optimization of gradient trajectory (BOGAT), for single‐slice and simultaneous multislice imaging.


INTRODUCTION
Ultrahigh-field (≥7 T) MRI provides exceptional signal-to-noise and spatial resolution, but it can be limited in practice by areas of signal dropout due to transmit-field B 1 + inhomogeneity.3][4] Many common neuroimaging protocols require slice-selective excitation, often with oblique slices to maximize brain coverage.The challenge of designing pTx pulses that are optimized for slice-selective excitation in a specific subject is still not fully solved.This optimization step must be completed quickly to avoid losing time while the subject waits in the scanner.Spokes pulses 5,6 are popular for slice-selective pTx excitation.They comprise a series of slice-selective RF subpulses separated by in-plane gradient blips as illustrated in Figure 1.Spoke pulses are normally designed for a specific slice in the patient under examination using ΔB 0 and per-channel B 1 + maps acquired from that subject.This design process is quick, because the RF optimization variables consist of only one complex weight per channel per RF subpulse.Several algorithms exist to optimize the RF subpulse coefficients, to improve excitation flip-angle homogeneity and minimize SAR, including the popular Tikhonov-regularized magnitude least squares (MLS) algorithm. 7he gradient blips in a spoke pulse add a positiondependent phase shift to the magnetization ahead of the next RF subpulse.The net effect of a gradient blip and RF subpulse combination therefore has more possible variations than the RF subpulse alone.This means that the overall spoke pulse has more chance to achieve the desired target magnetization pattern.In the original descriptions of spokes, it is envisaged that many RF subpulses will be used, but in practice T 2 and T 2 * relaxation effects and a requirement for robustness to ΔB 0 inhomogeneity mean that most sequences use two or three RF subpulses. 8t present, spoke pulses are normally optimized online per subject. 9This optimization strategy requires that the optimization completes while the subject lies still in the scanner, which is especially challenging when designing pulses for whole-brain slice-by-slice imaging.Therefore, it is common to select the gradient blips using fixed values or through heuristics such as the iterative Fourier transform approach (FTA) described in Figure 1C.Grid searches can be too slow for online optimization, even with enhancements such as sparsity enforcement. 10Local optimization approaches such as greedy joint optimization 8,11,12 are susceptible to getting stuck in local minima with suboptimal results in terms of flip-angle homogeneity or energy deposition.
Furthermore, many neuroimaging studies use simultaneous multislice (SMS) acceleration. 13,14This means that while the RF weights for each slice can be optimized individually and modulated according to the slice positions, all slices must share a single common set of gradient blips.The choice of RF coefficients for pTx SMS excitation has been investigated, 15,16 but to our knowledge, the problem of optimizing gradient blips for pTx SMS spokes has not yet been investigated.
As a corollary, the widely used k T -points approach 17 for nonselective pTx excitation can be thought of as a form of spoke pulse with infinite slice thickness.In other words, k T -points pulses comprise a series of nonselective RF pulses (without a slice-select G z gradient) separated by gradient blips that can now be placed anywhere in (k x , k y , k z ).It is common to use more subpulses (generally 4-10), as each subpulse is shorter in duration.k T -Points gradient design otherwise resembles the spoke pulse design.Note that for k T -points there are existing algorithms that jointly determine gradient and RF, 18,19 which typically take between 5 s and 90 s to complete a local optimization.
We explore a new approach to tackle these gradient design problems with Bayesian optimization.Bayesian optimization is an emerging method for global optimization, 20 capable of solving design problems in a wide range of fields including robotics, 21 reinforcement learning, 22 and drug discovery. 23Using a probabilistic surrogate model to guide the search, Bayesian optimization is highly efficient when the target function is slow to evaluate.
In this work, we investigate the optimization landscape for spoke pulse design in the human head and assess the implications for heuristic, local, and global optimization approaches.We introduce a new global optimization method, Bayesian optimization of gradient trajectory (BOGAT), for spoke pulse design.We evaluate BOGAT in phantoms and in vivo, designing single-slice spoke pulses, SMS spoke pulses, and k T -points pulses.

THEORY
The core algorithm for spoke pulse optimization is shown in Figure 1.To understand why this approach works, consider a set of points x covering an imaging slice for spokes (or a volume for k T -points).The complex transverse magnetization m(x) = m x (x) + im y (x) immediately following an excitation pulse can be written, as an example, using the small tip angle (STA) approximation 7,11 as follows: where i = √ −1, γ is the gyromagnetic ratio; m 0 is the equilibrium magnetization; S r is the transmit sensitivity of rth coil at position x; N c is the number of transmit coils; B 1 (t, r) is the rotating frame B 1 + field phasor for the rth coil at time dt ′ is the location in transmit k-space where G is the in-plane xy gradient; ΔB 0 (x) is the static field offset at position x; and T is the total length of the RF pulse.
For a spoke pulse, each RF subpulse is defined by a repeated RF pulse shape, multiplied by a complex coefficient b(n, r) per subpulse n and per channel r.We also discretize time as t = τΔt, where τ is the time index within each subpulse, giving where B 1,sub denotes the unit B 1 + field of the subpulses.This equation can be rewritten in the matrix form following Eq.( 5) in Grissom et al 24 to give where m is a complex vector whose elements m p are the complex transverse magnetization immediately following the pulse at each of the N p discretised positions x p , and A is an N p × (N s × N c ) flattened system matrix defined as where the indexes for channel r and subpulse n are flattened as in a Kronecker product into the second dimension.The pulse design vector b has dimensions (N s × N c ) × 1 and elements If the spoke positions  are chosen, the optimal RF coefficient vector can be determined with, for example, a regularized MLS 7 solver that minimizes Where λ is a Tikhonov regularization parameter that penalizes high energy pulse designs.Hence, the A matrix depends directly on the spoke positions , and the optimum pulse coefficients b depend implicitly on .
Because the MLS problem is a nonconvex optimization problem, to ensure the consistency of the b values in the following step, the RF optimizations should keep the same initialization conditions.For example, one can set initial values to a CP mode excitation, and in the case of a variable exchange 7 MLS optimizer, set the initial target phases to 0 • for the DC spoke.
Next, we choose to optimize for the spoke positions  in an outer loop, 8 updating the RF pulse design at each step: For SMS applications, we use a weighted sum of the cost function among all slices, so that b = argmin and where the weight w l can be set to adjust the contribution of each slice to the total cost.For example, they can be determined by the average cost of one or more test samples, 2 ) , to avoid the total cost being dominated by some difficult slices.Different Tikhonov regularization factors λ l are allowed for each slice.
Note that the MLS RF optimizer used to illustrate this derivation is merely one possible RF optimization method.In general, there are many existing RF optimizers using various forms of cost functions, and we can write where y is the generic RF optimizer cost function.Then the optimization with respect to the spoke positions  becomes where y b() = y( b; ), the RF cost function value at the optimum b.Regardless of the exact form of y, the formulation of our additional outer optimization problem for the spoke positions  is the same.

Bayesian optimization
The cost function y b for  is nonconvex and slow to evaluate, so Bayesian optimization 25,26 is a suitable approach to search for the global minimum.
Illustration of using Bayesian optimization to optimize k T -spoke position.(A) The true cost function (which could only be computed by running the RF optimizer thousands of times for every step in k T ) and how it is approximated by a Gaussian process (GP) fitted to the even cost function values that have been evaluated so far.(B) The "expected improvement" acquisition function that predicts the best point to sample next.Note how its values are higher where either there is a known potential minimum (about 0, promotes "exploitation") and where the uncertainties are higher (promotes "exploration").The k T -space location is treated as a 1-dimensional (1D) variable here for the purpose of illustration, and in reality, for spokes and kT-point pulses it is a 2D and 3D variable, respectively.CI, confidence interval.
Without explicit knowledge about y b, Bayesian optimization uses a Gaussian process (GP) 27 to model the cost function of spoke positions.A Gaussian process is a statistical model that describes a probability distribution over possible functions that fit a set of observed points, where any finite samples of the model are jointly Gaussian distributed.It is characterized by a mean function and a kernel that describes the expected correlation between samples.As new observations are acquired (i.e., as gradient k-space  i points are sampled and the corresponding cost function values are evaluated y b[ i ]), the GP model is updated by an efficient analytical step.This provides an evolving best estimate (mean function) of the cost function y b() and its uncertainty (uncertainty function) at each step in the optimization.The GP acts as a surrogate to the underlying cost function y b(), which can be evaluated rapidly without time-consuming RF solver steps. 22ayesian optimization is robust to noise in the computed cost function samples, which arises in this work due to small inconsistencies in the result of the RF optimization step due to the RF optimizer stopping criteria, because these get absorbed into the uncertainty function of the Gaussian process.
Instead of sampling  randomly, a key feature of Bayesian optimization is that it uses an "acquisition function" a to help automatically determine the most efficient point where the cost function should be evaluated next 28 : In other words, this finds the most informative gradient blips on which to perform RF optimization in the next iteration.
The acquisition function is updated every time a new observation is made.It is designed to have large values either where an area in the k T -space is unvisited (promotes "exploration") or around the known potential minima (promotes "exploitation").This behavior helps Bayesian optimization achieve rapid convergence.Although Bayesian optimization is not mathematically guaranteed to find the true global minimum, in many cases it often does. 29e used the "expected improvement plus" acquisition function in MATLAB (version 2022b; MathWorks, MA, USA), which builds on the widely used "expected improvement" acquisition function by applying additional heuristics to balance exploration and exploitation and hence avoid overexploiting local minima. 29o further accelerate convergence, Bayesian optimization can benefit from initial seeds, which are the  sample points that are always evaluated at the start, if some prior information about the typical cost function surface is known.
Figure 2 illustrates the Bayesian optimization method in a 1-dimensional example.A more detailed overview of the key mathematical concepts is included in the Supporting Information.

BOGAT
The BOGAT approach uses an outer loop to perform Bayesian optimization to search for the optimal spoke positions.An example BOGAT algorithm with an MLS RF optimizer is illustrated in Figure 1D.The last spoke is fixed at the center of k T -space.The preceding spoke is sampled in the k x -k y plane to perform Bayesian optimization for a given number of iterations.The RF coefficients are optimized at each position to calculate the MLS cost.The position with the lowest predicted cost is determined to be the best spoke position.If more than two spokes are used, we chose to optimize each preceding spoke iteratively, and the MLS target phase is conserved between each additional spoke to ensure the solution stability.This is a greedy approach with each iteration only considering the best spoke location for that step.Alternatively, all spoke positions could be simultaneously optimized to find a globally optimal solution, but the efficiency of Bayesian optimization, as with most sampling-based methods, generally decreases with the dimensionality of the search space. 30n comparison, the vendor's default approach (described in Section 3.4) is a heuristic algorithm.

Pulse design
Pulse designs were performed with a modified version of the PTx Pulse Design toolbox (Siemens, Erlangen, Germany) in MATLAB (version 2022b).RF optimization was performed using Tikhonov regularized MLS with the least-squares QR-decomposition solver in the PTx Pulse Design toolbox unless otherwise specified.All calculations except the grid search were performed on a desktop computer (Intel Core i9-10 900X, 64 GB RAM).

Simulation studies
All simulation studies and prediction results were based on a database of in vivo adjustment data sets acquired using an 8-transmit (Tx)/32-receive (Rx) head coil (Nova Medical, USA) on a 7T Terra MRI system (Siemens).These data were collected in previous scans under the Cambridge Human Biology Research Ethics Committee ethics approval (HBREC.2020.27).
The ΔB 0 and B 1 + maps were acquired with dual-echo gradient-echo and magnetization-prepared turbo-FLASH 31 (TFL) sequences, respectively, and they were processed, masked, and exported as MATLAB files in the vendor's PTx Pulse Design framework.Flip-angle (FA) RMS errors (RMSEs) were computed by numerical solution of the Bloch equations ("Bloch simulation").

Grid search
A grid search was performed in which we varied the k T -space position of the first spoke in a two-spoke pulse.The k x -k y plane was sampled at 51 × 51 locations between ±0.15 cm −1 (±3/FOV).The final spoke was fixed at k x = k y = 0.00 cm −1 .RF optimizations were performed using regularized MLS with ΔB 0 and per-channel B 1 + maps as described previously.Transverse (N = 5) and transverse oblique (N = 9) slices of the whole brain were divided into 16-mm slabs, totaling 85 slabs.Each pulse was optimized over all slices in that slab.Two target FAs (30 • and 45 • ) and two fixed Tikhonov regularization factors (0.03 and 0.10) were used.The maximum number of iterations for the RF solver was set to 200, which was large enough to allow convergence at all points.The resulting plots (Figure 3) reveal the typical features in a two-spoke optimization landscape and are discussed in further detail subsequently.
A similar search was performed for the first spoke in a three-spoke pulse, where the second spoke was fixed at the minimum determined in this two-spoke search, and the final spoke was fixed at the k-space origin.
The grid searches were computed on 32-core research computing servers (Intel Xeon Gold 6142, 192GB RAM) at the Cambridge Service for Data Driven Discovery (CSD3).

Fourier transform approach
The vendor's default approach to determine spokes positions uses what we call the Fourier transform approach (FTA).As shown in Figure 1C, this searches for spoke positions by first fixing the final spoke at k x = k y = 0.00 cm −1 , optimizing its RF pulse coefficients, computing the resulting predicted transverse magnetization using the subject's ΔB 0 and per-channel B 1 + maps.This result is subtracted from the target magnetization to give a residual, which is then Fourier-transformed to give a k T -space residual.The preceding spoke is placed at the k T -space point with largest absolute residual.

BOGAT
Bayesian optimization of spoke positions was implemented using MATLAB's "bayesopt" function (a code snippet is included in the Supporting Information) as explained in Section 2 and summarized in Figure 1D.The k x -k y search space ranged between ±0.15 cm −1 (approximately ±3/FOV) in each dimension.Based on the grid search results (shown in Supporting Information Figure S1), initial seeds were chosen to lie on two rings: The first ring has radius 0.05 cm −1 and is sampled with 4 points; the second ring has radius 0.09 cm −1 and is sampled with 5 points.All designs used a maximum of 30 Bayesian optimization iterations unless otherwise specified.

Comparison with grid search
To evaluate the performance of BOGAT to estimate the cost function, one 30 • bipolar two-spoke pulse was designed for each slab used in the grid search described previously (total of 85 slabs, and transverse and transverse oblique slices) with the BOGAT method.RF coefficients were optimized with regularized MLS with the Tikhonov factor fixed at 0.03.Sinc subpulses with time-bandwidth product of 4.0 and duration of 3.0 ms were used.The optimal k T -space location for the first (movable) spoke determined by the BOGAT method in 30 iterations was compared with the grid search results.Additionally, we compared the BOGAT optimization performance against local optimizations with one or more random starting points based on our grid search data set.This was computed from the grid search data by assuming that each random "starting point" falling within the catchment basin (Figure 3C,E) of a local minimum will converge to the minimum value for that catchment basin.This mimicked an idealized convergence for a gradient-descent local optimizer.
The efficiency of sequential and simultaneous sampling for three-spoke designs was compared.For the sequential sampling case, one further spoke was added to this BOGAT-optimized two-spoke pulse, where all other design parameters were kept constant.For the simultaneous sampling case, the Bayesian optimization algorithm was run with four variables (k x and k y for both first and second spokes) for 100 iterations.The RF optimizer was run as described previously.
To illustrate the advantage of BOGAT in terms of the trade-off between excitation quality and pulse energy, two-spoke pulses were designed with whole-brain transverse field maps from the same field map database described previously.We tested two-spoke 45 • excitation for single-band, multiband (MB) 2, MB3, and MB4-each with a total of 36 sets of slice groups in 9 subjects, with 8 Tikhonov factors (0.001, 0.01, 0.03, 0.05, 0.08, 0.1, 0.2, and 0.5).We compared the vendor's default FTA method with spoke positions determined only by the central slice in each slab.The test samples used to determine the slice weightings in BOGAT were chosen to be the same as the BOGAT seed samples.The slice weightings are intended to force the optimizer to balance its effort between the slices, instead of, for example, spending almost all effort on the harder slice leading, which might lead to an unexpectedly poor solution for the easier slice.

Prospective phantom validation
The ΔB 0 and per-channel B 1 + maps were acquired with dual-echo gradient echo and saturation-prepared TFL using the aforementioned 8 Tx/32 Rx head coil (Nova Medical) on a 7T Terra MRI scanner (Siemens Healthineers).A two-spoke and a three-spoke 90 • monopolar pulse were designed for a single slice with BOGAT and regularized MLS with a fixed 0.10 Tikhonov factor.Each sinc subpulse had a duration of 3.0 ms and time-bandwidth product of 4.0.For comparison, designs were repeated with FTA spoke positioning, with identical RF MLS optimization parameters.
The FA maps of these pulses were measured with a saturation prepared TFL sequence with readout FA = 5 • , TR = 5000 ms, TE = 1.85 ms, 1.6 × 1.6 mm in-plane resolution, and the linear correction recommended by Tomi-Tricot et al. 32 For SMS validation, field maps were acquired as previously for three slices 40 mm apart in the agar phantom.We compared BOGAT with the vendor's default FTA method with spoke positions determined only by the central slice.For RF optimization, the Tikhonov factor was determined by the L-curve method scaled down one-half times. 33MS multiband FA maps were measured in three separate scans.Each time magnetization was prepared with the multiband pulse, but readout was for a single slice each scan.The pulse energy was calculated for the modulated and summed multiband pulse with optimized phase offset. 34

Prospective in vivo validation
Four volunteers gave written consent under local ethics approval (HBREC.2020.27)and were scanned with the same 8 Tx/32 Rx head coil (Nova) and 7T Terra MRI scanner (Siemens Healthineers).The ΔB 0 and per-channel B 1 + maps were acquired for a transverse slice across the brain as described previously.Two-spoke 90 • excitation pulses were designed with both BOGAT and FTA each with four Tikhonov factors (0.01, 0.03, 0.05, and 0.1), and the resulting FA maps were measured with the TFL sequence described previously.
In 1 subject, we additionally implemented BOGAT with a Jacobian matrix large tip angle (LTA) optimizer. 35wo-spoke and three-spoke 135 • pulses were designed.The LTA optimizer was set only to update the RF coefficients and not alter the spoke positions.Twenty BOGAT iterations were used to reduce computation time.This was compared with FTA designs in which spoke positions were set based on the same STA calculation as previously and then fed to the LTA optimizer for RF optimization at those fixed spoke positions.FA maps were measured with the same TFL sequence.
We also applied BOGAT for spin-echo EPI in a volunteer.The methods are described in Zhang et al. 36 except that BOGAT was used for pulse design.The volunteer gave informed consent and was recruited under ethics CPHS #2020-07-13 437.The scan was performed on the Impulse Version MAGNETOM Terra 7T MRI scanner 37 (Siemens Healthineers) at the University of California in Berkeley using a 16 Tx/96 Rx head coil 38 (MR CoilTech, Glasgow, UK).The excitation pulses had the same pulse design parameters as previously, and the refocusing pulse had a longer duration of 5.12 ms for each subpulse.To avoid eddy current effects, we did not increase the gradient and slew-rate limits for the pulses compared with our 7T Terra scans.Other sequence parameters were as follows: TR = 4400 ms, TE = 60 ms, 15 slices, 1.25-mm 3 isotropic voxels, bandwidth = 1860 Hz/px, GRAPPA = 3, and partial Fourier = 6/8.

k T -Point pulse design validation
Finally, we also tested whether the BOGAT approach can usefully be generalized to nonselective excitation.To demonstrate this, we retrospectively designed 30 • k T -point pulses for whole-brain imaging for 1 subject in the same database.We compared the whole-brain FA RMSE performance for fixed 7kT and 5kT, fixed 4kT with 1 BOGAT point, fixed 3kT with 2 BOGAT points, and fixed 1kT with 4 BOGAT points.The fixed 7kT points were placed at k = ±0.05cm −1 symmetrically in all three directions; and in the latter cases, the positions with the lowest RF coefficient norms in the optimized 7kT pulse were removed.The RF optimization was performed with the same MLS optimizer as mentioned previously, with the Tikhonov factor fixed at 0.01.

Grid search
Figure 3 shows the FA RMSE, pulse energy, and cost function landscapes plotted as a function of spoke position in k T -space for a two-spoke 30 • pulse.The second spoke is always fixed at the origin (k x = k y = 0.00 cm −1 ).The landscape is clearly nonconvex with an average of 18 local minima in each slab.Supporting Information Figure S2 shows the number of local minima from each grid search result as a histogram.The cost function is generally smooth but has discontinuities that correspond to regions where the RF MLS optimization settles on a different set of subpulse coefficients.This is illustrated in a specific example in Supporting Information Figure S3.Note how adding more spokes gives a smoother cost function surface, albeit one that still contains many local minima (see Supporting Information Figure S4).Small changes in the target FA or the Tikhonov regularization factor do not alter the general structure of the cost function landscape, comparing columns C-E in Figure 3.With the same RF optimization distribution of minima depends on the prescribed slab position.

In vivo prediction using field-map database
Figure 4 shows that for each slab position, Bayesian optimization for two-spoke pulses approximates the cost function closely (exploration) and in most cases converges to the global minimum (exploitation) within 30 iterations.The whole landscape is better estimated when the function is smooth.
Figure 5 shows how BOGAT converges for two-spoke pulses.Figure 5A shows the error descent curve of the BOGAT iterations with respect to the global minimum determined by the grid search, for all 85 slab positions over 14 subjects calculated in the grid search.Figure 5B is the histogram of the relative cost when BOGAT was terminated at the 30th iteration.
Figure 5C compares the cost distribution of BOGAT results with all local minima in these slabs, where we see that BOGAT on average finds a better minimum than a gradient descent-based local optimization algorithm starting from a random point.Ninety-three percent of the BOGAT calculations reached a cost less than 1.1 times the global minimum within 30 iterations, whereas only 44% of local minima achieved this.The closest whole number of random starting points required for a local optimization to have a similar convergence performance as BOGAT is 7, with 91% of cases reaching a cost less than 1.1-times the global minimum.
Supporting Information Figure S5 shows the convergence rate comparison of a three-spoke design between simultaneous and sequential sampling.Sequential (greedy) optimization with 60 total iterations outperformed simultaneous optimization with 100 iterations, showing the advantage of Bayesian optimization for low-dimensional optimization spaces.
Figure 6 shows the trade-off between FA homogeneity and RF power deposition for spoke pulses, in the form of an L-curve of FA RMSE against pulse energy.This is plotted for single-band, MB2, MB3, and MB4 pulse designs with a target FA of 45 • .Compared with FTA, BOGAT shifted Phantom validation.(A) Single-slice two-spoke and three-spoke flip-angle map with Fourier transform approach (FTA) and Bayesian optimization of gradient trajectory (BOGAT) at a fixed Tikhonov regularization factor 0.1.BOGAT achieves more homogeneous excitation pattern in the agar phantom at a lower pulse energy.(B) Flip-angle maps and pulse energies for multiband factor 3 pulse designs.Automatic Tikhonov regularization factors determined with L-curve were used.Note the markedly lower pulse energy (55% reduction) when designed with BOGAT.All other magnitude least squares (MLS) optimization parameters for RF coefficients were kept constant in all parallel-transmit (pTx) methods.RMSE, RMS error; SMS, simultaneous multislice.the L-curve to the left, providing lower RMSE at equal pulse energy, or lower pulse energy at equal magnetization quality when identical RF optimization algorithm and parameters were used.Of the 36 slice groups tested for each MB factor, BOGAT showed a mean energy reduction of 8.9% (single-band), 5.9% (MB2), 9.4% (MB3), and 7.0% (MB4) (p < 0.001 in all cases), while reducing FA RMSE by 11.4% (single-band), 35.6% (MB2), 19.7% (MB3), and 19.4% (MB4) (p < 0.001 in all cases).

Prospective phantom validation
Phantom validation results are shown in Figure 7.For single-slice pulse designs with the same MLS Tikhonov factors, BOGAT achieves more homogeneous FA patterns than FTA.For two-spoke designs, the BOGAT RMSE was 12.9 • , which is a 29% decrease compared with 18.1 • for FTA.For three-spoke designs, BOGAT achieved 8.6 • RMSE, which is a 56% decrease compared with 19.4 • for FTA. Figure 7B shows SMS-MB3 pulse designs that were run with automatic Tikhonov factor selection using the L-curve approach to favor similar RMSE and encourage BOGAT to lower the pulse energy where possible.In this case, BOGAT reduced MB pulse energy by 55% in phantoms.The mean total optimization workflow time for a two-spoke BOGAT MB3 pulse increased to 22.8 s compared to 13.7 s with FTA.As a reference, the grid searches with Bloch simulations shown in Figure 3 with In vivo validation.The flip-angle maps (two-spoke, target = 90 • ) measured at the same transverse slice in 1 subject, with pulses designed with Fourier transform approach (FTA; top row) and with Bayesian optimization of gradient trajectory (BOGAT; lower row).BOGAT improves RMS error (RMSE) and reduces the pulse energy (except in the case where Tikhonov factor λ = 0.01 in the left-most column).

F I G U R E 9
T 2 spin-echo EPI images with a 16 Tx/96 Rx head coil.Raw images with Fourier transform approach (FTA; A) and Bayesian optimization of gradient trajectory (BOGAT; B) pulse design.(C) The percentage signal difference between BOGAT and FTA images.BOGAT images recover more signal in the frontal lobe and temporal lobes, while reducing the pulse energy.SAR, specific absorption rate.51 × 51 grid points took an average of 55.2 min for each slab on a 32-core computer.Hence, BOGAT achieved a global optimum spoke location more than 50 times faster than these grid searches.

Prospective in vivo validation
Prospective in vivo validation also showed improved FA homogeneity and/or reduced pulse energy at different Tikhonov factors; the measured FA maps for the STA For the LTA designs, the BOGAT two-spoke pulse achieved better FA uniformity with a lower pulse energy than the FTA three-spoke design (Supporting Information For five fixed and even fixed kT-points, the gray lines were drawn to guide the eyes.From left to right, the whole-brain mean FA NRMSE values are 10.2% (5 fixed), 9.8% (7 fixed), 9.9% (4 fixed +1 BOGAT), 9.7% (3 fixed +2 BOGAT), and 9.4% (1 fixed +4 BOGAT).
Figure S7).The BOGAT extra computation time was 75 and 159 s for two-spoke and three-spoke LTA designs, respectively.
For the spin-echo single-shot EPI acquisition acquired on the IMPULSE 7T scanner, Figure 9 shows how BOGAT improved the image homogeneity in frontal lobe and temporal lobe, while also reducing the energy deposition (1.86 J for BOGAT vs. 2.16 J for FTA for a 90 • -180 • pair).

DISCUSSION
We have implemented Bayesian optimization for MRI pulse design and demonstrated that our BOGAT method can rapidly converge toward the global optimum spoke positions for two-spoke pulses, which translates to improved FA homogeneity and reduced pulse energy without the need to alter the RF optimizer.Our inspiration for BOGAT came from the cost function landscape of the spoke positions in the k T -space.In Supporting Information Figure S8, we demonstrate that the global cost function minimum was within ±3/FOV of the center of k-space in all subjects, and most significant local minima are also in this region of k-space.This is expected, as for a small number of spokes, each additional spoke corrects for some slow-varying inhomogeneity across the whole FOV due to the B 1 + spatial distribution for each channel and interference patterns between channels.It justifies our choice of the BOGAT search range ± 3/FOV.Supporting Information Figure S1 shows that, based on a weighted average of the cost function, a typical minimum in the two-spoke landscape occurs on a ring around the center of the k T space.We have used this observation to initialize the seed points to accelerate BOGAT optimization.
The discontinuities in the cost grid correspond to the boundaries where the RF MLS algorithm converges to different patterns, as demonstrated in Supporting Information Figure S3.We note that our grid results show fewer discontinued patches compared with the previous works, 40 probably because we used slab optimization of multiple slices, resulting in more stability in the solution.This is, in our opinion, a closer representation of the realistic use case, as slice-by-slice optimization over a big imaging volume would take too long for online optimization in an in vivo protocol.
Our chosen Gaussian process before the Bayesian optimization method can technically only represent smooth target functions, 27 yet we observe that this approach is effective at finding the global minimum even when that is in a discontinuous area (Figure 4C).Biasing the acquisition function toward "exploration" can produce better results when the landscape is expected to contain more discontinuities, such as when optimizing single slices, in midbrain areas.
Unlike many interleaved algorithms, BOGAT treats the gradient optimization as a separate problem from the RF optimization.Therefore, while it is generally deemed not possible to seek the global optimality of the RF solution or the whole pulse, 40,41 we are able to efficiently look for the optimal gradient trajectory under a particular RF algorithm and its parameters.On the other hand, interleaved gradient and RF optimization algorithms effectively increase the dimensionality of the problem, and hence have a greater tendency to hit local minima. 42ecause BOGAT uses an extra outer loop to optimize for the gradient positions, it may, at first glance, be deemed a very time-consuming algorithm.However, it should be noted that depending on the MR systems, a large proportion of the pTx optimization workflow time may consist of field-map conversions, calculation environment setups, visualizations, constraint checks, and output waveform conversions.Therefore, when the inner loop RF optimization is implemented efficiently, BOGAT does not significantly increase the overall workflow time.As shown in the results section, BOGAT computation times of about 10 s are suitable for online subject-specific optimizations of spoke pulses.This computation time is like a method previously suggested by Dupas et al., who recommended using 15 random initial points for parallel local optimizations for a robust solution.We could in principle use the extra parallel computing capacity to simultaneously calculate pulses for multiple slices or slabs using BOGAT.
BOGAT can be applied for single band and multiband pulse design; the benefit appears to be greater for multiband pulse designs than single-band designs (Figure 6).
We developed BOGAT based on field maps acquired with the Nova 8 Tx/32 Rx coil in Cambridge.We tested its wider applicability for human pTx neuroimaging as shown in Figure 9.The results came from a 16 Tx/96 Rx 7T head coil (MR Coiltech, UK), and in this case BOGAT recovered more signal in the frontal and temporal lobe areas, while reducing the total pulse energy by 14%.
We have successfully demonstrated that BOGAT, being an optimization outer loop, is compatible with STA MLS and a Jacobian matrix LTA solver for spoke pulse designs, and for k T -point pulse design.However, detailed performance analysis and tuning for the LTA and k T -point cases remain to be addressed in future studies.Likewise, it could be valuable to apply BOGAT with SAR-constrained RF optimizers, which would avoid the arbitrary Tikhonov regularization factors needed in our current implementation. 16,39urthermore, we have chosen the Bayesian optimization implementation with an emphasis on the computation efficiency, which is required for an online pulse optimizer.With ongoing rapid advances in Bayesian optimization algorithms, there may be better choices for the Gaussian process kernels (e.g., to better represent discontinuous functions 43 ) and acquisition functions, 20 which we have not exhaustively evaluated in this study.

CONCLUSIONS
We have developed a Bayesian optimization algorithm "BOGAT" for fast, global optimization of spoke k T -space positions.We demonstrated it in conjunction with Tikhonov-regularized magnitude least squares and a Jacobian matrix large-FA RF optimization.BOGAT improves FA homogeneity compared with the vendor-provided FTA method for both single-slice and multiband excitations.Alternatively, BOGAT can be used to decrease pulse energy while maintaining excitation fidelity.Computation times are feasible for online per-subject optimization.

ACKNOWLEDGMENTS
A) RF and gradient form of a two-spoke pulse with sinc subpulses.(B) The transmit k-space (k t -space) trajectory of the two-spoke pulse shown in (A).(C) The automatic iterative Fourier transform-based approach (FTA) as implemented in the vendor-provided parallel-transmit (pTx) framework, compared with our Bayesian optimization of gradient trajectory (BOGAT) algorithm for calculating the spokes k T -space trajectory when paired with a magnitude least squares RF optimizer (D).MLS, magnitude least squares.
Examples of grid search results for the flip angle (FA) RMS error (RMSE; A), pulse energy (B), and magnitude least squares (MLS) cost (C), with respect to the k x -k y spoke position of the first spoke in a two-spoke parallel-transmit (pTx) pulse.The second spoke was fixed at the kT space origin (k x = k y = 0.00 cm −1 ).Three slab positions (superior, middle, inferior) in 1 subject are shown.The pulse was designed for FA = 30 • with Tikhonov regularization factor 0.1 (A-C), FA 30 • with Tikhonov factor 0.03 (D), and FA 45 • with Tikhonov factor 0.1 (E).In (C)-(E), the white lines mark the watersheds of the cost function (i.e., they divide k x -k y space into sections where a local optimization algorithm will converge to a single local minimum).

designs for 1
subject are plotted in Figure 8.The largest improvement is with the highest Tikhonov factor tested (0.10), where BOGAT achieved 30% reduction in RMSE (10.3 • vs. 14.7 • for FTA, target 90 • ) and 29% reduction in pulse energy (0.30 J vs. 0.42 J for FTA) across all subjects.The mean extra computation time was 7.8 s.More measured maps for all subjects are shown in Supporting Information Figure S6.

10
Retrospective kT-point pulse design for 1 subject (target flip angle = 30 • ).(A) The kT-point locations in the k T -space for pulses with and without Bayesian optimization of gradient trajectory (BOGAT) points (fixed points in yellow, BOGAT points in blue).(B) The simulated flip-angle (FA) maps for the center sagittal slice.(C) The mean normalized FA RMS error (RMSE) for all voxels in the brain mask.
M.Z. is supported by the Medical Research Council (MR N013433-1) and the Cambridge Trust.C.T.R. acknowledges research support from Siemens.This research was supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014).The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.The Cambridge 7T MRI facility was cofunded by the University of Cambridge and the Medical Research Council (MR/M008983/1).This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier 2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/T022159/1), and DiRAC funding from the Science and Technology Facilities Council (www.dirac.ac.uk).We thank Saskia Frisby and Marta Correia for contributing field maps from their pTx scans to the database of scans.We thank Sydney Williams, Samantha Ma, Belinda Ding, Alexander Beckett, and David Feinberg for facilitating a remote test of the BOGAT pulses on the IMPULSE 7T Terra scanner at UC Berkeley.BOGAT is the subject of a UK patent application.
R E F E R E N C E S