Accelerated motion correction with deep generative diffusion models

The aim of this work is to develop a method to solve the ill‐posed inverse problem of accelerated image reconstruction while correcting forward model imperfections in the context of subject motion during MRI examinations.


Introduction
MRI is a highly effective medical imaging modality which owes much of its utility to having superior soft tissue contrast without any ionizing radiation.Unfortunately, MRI is notoriously slow when compared to other imaging methods.This limitation leads to increased operating costs and even decreased image quality due to a variety of factors.A common way to reduce scan time is to simply acquire less data and thus subsample k-space.This process, however, makes the task of recovering the desired image an ill-posed inverse problem.To better handle this task, many techniques have been developed such as parallel imaging (38; 46; 14), handcrafted image regularization (32; 15; 51), dictionary learning (39), subspace constraints (49), and more recently deep learning (16; 1; 3; 5; 21; 30; 9).Although highly subsampling measurements reduces the likelihood of motion occurring during the scan, MRI is still susceptible to subject motion due to physical constraints during a given scan such as the repetition time (TR) needed between excitations.The resulting artifacts can often render the image non-diagnostic, and may ultimately require the corrupted scan to be reqacquired (55).
The severity of motion artifacts in the final image is related to a variety of factors such as acquisition parameters (sampling trajectory, echo train ordering, signal preparation) and the degree of motion.See Fig. 1 for an example.These artifacts have tangible costs in clinical settings, especially when scanning pediatric patients where motion artifacts are extremely common (45).Many approaches to address motion corruption have been proposed.These methods can be separated into two broad categories: prospective, and retrospective.
Prospective methods are categorized as those which can be used at scan time to modify the acquisition in response to patient motion.To measure motion during the scan a variety of approaches have been proposed that either leverage additional pulse sequence actions like motion navigators or external measurement devices such as respiratory bellows, PILOT tones, nuclear magnetic resonance probes, and optical tracking (6; 12; 29; 35; 33).These measurements can be used to correct motion by binning data into different motion states to later inform the reconstruction process, or even discarding corrupted measurements and guiding reacquisition of corrupted data.Due to the overhead created by additional measurement equipment and reacquisition of data, these methods may still increase operating costs and even scan time, as well as require modifications to the sequence or addition of peripheral hardware.
Retrospective methods assume no control of the imaging procedure and correct for motion artifacts after measurement data have been collected.This means techniques that require sequence modification are not possible.They also typically assume no access to direct measurements of the true motion states which occurred during the exam.Retrospective techniques are more widely applicable but also face a more difficult task than prospective methods.
In light of recent advancements in the area of deep learning, perhaps the most straightforward approach to retrospectively correct motion is to directly map motion corrupt images to clean images.
One such approach successfully trained a conditional generative adversarial network (GAN) to translate motion corrupt images to clean images (22).This technique falls into the class of end-toend deep learning methods.However, as previously stated, artifact appearance is heavily dependent on the chosen forward operator, i.e. the sampling trajectory (Fig. 1).Due to this, a network's performance at test time is highly dependent on how motion is synthesized at training time to create training pairs (27).
Motion can be described as an unknown perturbation to the assumed forward model that gives rise to artifacts at reconstruction time.This has led previous works to jointly solve an optimization problem for the target image and the unknown motion that occurred at scan time (11; 28; 40; 10; 18).These methods have primarily been applied to the low acceleration regime.To build upon joint optimization, supervised learning has been used as one step in a larger iterative algorithm that jointly solves for the image and the motion parameters (17).Although this method shows notable improvements over prior methods, it is still likely susceptible to distribution shifts in the forward operator (changes to acquisition and sampling parameters), as to train the end-to-end network component it is necessary to pre-select the manifestation of the motion artifacts to learn the proper inversion.It also still relies on a linear reconstruction backbone for solving the accelerated reconstruction task which is not as powerful as recent deep learning based reconstruction techniques.
In contrast to these techniques, in this work we propose a retrospective motion correction technique that builds off of recent advances in deep generative models (47; 20).In particular, we formulate the reconstruction under the lens of posterior sampling (21; 9; 30).We extend the framework to joint posterior sampling over the image and the rigid motion parameters.Our goal was to develop a method which is (1) effective at correcting in-plane, rigid motion from subsampled data while (2) being agnostic to choices in the forward model which can greatly affect the manifestation of the motion artifacts observed.

Accelerated MRI Reconstruction
The goal of accelerated image reconstruction in MRI is to recover an image x ∈ C N from undersampled Fourier measurements (k-space) y ∈ C M .We can denote the measurement (forward) process in MRI as where y (l) is k-space of the l th coil, N K ∈ C M ×N denotes the non-uniform Fourier transform operator (2D or 3D) evaluated at coordinates K ∈ R d×M , d = 2, 3 for 2D and 3D imaging respectively, S l ∈ C N ×N is the l th coil sensitivity map, and η l ∼ N (0, σ 2 I) is additive noise.We can consolidate the forward operator for all coils into one operator A = N K S.
Viewed from the perspective of regularized inverse problems, reconstruction can then be formulated as solving the optimization problem where R(x) can be a handcrafted image regularization term such as L1-wavelet sparsity (32), or low-rank structure (15; 51).Reconstruction can also be solved with deep networks by learning a mapping (f θ ) from measurement to image space using training data(1; 16) x * = f θ (y). [3] More recently, there has been a push to use deep generative models to learn useful statistical priors for regularization (50; 31; 21; 9; 30).In these techniques, reconstruction takes on a Bayesian formulation where the goal is to solve the inverse problem with a variety of estimators such as maximum a posteriori (MAP) estimation, minimum mean square error (MMSE) estimation, or posterior sampling which have perceptual benefits over many other formulations (4).

Diffusion Generative Models for Inverse Problems
Recent work in the generative model space has been focused on diffusion processes (47; 20; 48; 48; 23).
For the remainder of this section we will adopt the notation introduced in (23).Diffusion generative models can be understood through viewing two complementary stochastic differential equations (SDE).The first SDE is called the forward process.In the forward process, noise is gradually added to the data distribution of interest: Here s ′ (t) s(t) is often referred to as the drift coefficient, while s(t) 2σ ′ (t)σ(t) is commonly called the diffusion coefficient, and ω defines a Brownian motion process.This process can be reversed via a complementary SDE or ordinary differential equation (ODE) (2; 48; 23).We will focus on the reverse ODE which is given by: x − s 2 (t)σ ′ (t)σ(t)∇ x log p (x; σ(t)) dt [5] where x = x s(t) .When run from time t = T to t = 0 the procedure results in sampling from the original clean data distribution p data (x).Here we note that s(t) and σ(t) are analytically defined in the forward equation (Eq.4), so the only portion of the reverse ODE which needs to be learned is the score (∇ x log p(x; σ(t))) at each time t.The score can be approximated by training a neural network (D θ (x, σ(t))) via denoising score matching (53).For clarity in future equations, we note here that D θ (x; σ(t)) is not a direct approximation to the score function but rather is trained to predict the denoised signal at each noise level leading to the following relation with the score function at each time point during the reverse process: With access to an approximation of the true score function, the reverse ODE can be solved using ODE solvers like Euler's method (1st order).To solve inverse problems, we can instead use the following reverse ODE: Following this ODE, we will be sampling from the posterior distribution p(x|y).The key issue with this approach is that we only analytically know the form of the likelihood at time t = 0 (e.g.p(y|x; σ(0)) = N (Ax, σ 2 I)).Prior works like Diffusion Posterior Sampling (DPS) (8) have approximated the likelihood at intermediate times steps with p(y|x(t); σ(t)) ≈ p(y|x; σ(0)), [8] where ] is an estimate of the denoised image at time t = 0 and is given by Tweedie's formula (13) to be This is leads to an inference procedure for solving inverse problems as shown in Alg. 1.

Measurement Formation in the Presence of Motion
We consider motion which is rigid, in-plane, and occurs between readout lines.The assumption that motion does not occur during the readout period is not too restrictive since the readout duration is typically much shorter than the time between readouts.This assumption means that issues such as spin-history effects are not considered.Under these assumptions we can characterize the effects of rigid body motion (rotation, translation) on k-space measurements using simple Fourier theory.In particular, rotation in image space leads to the same rotation in k-space, while translation in image space causes linear phase shifts in k-space.Both of these effects can be captured in a modified forward operator: where x ∈ C N is the motion-free image, S ∈ C NcN ×NcN contains the N c sensitivity maps, R θ i is a rotation matrix for the i th motion state, P ϕ i is a diagonal matrix implementing a linear phase shift describing the horizontal and vertical translations during the i th motion state, K i are the coordinates for the intended k-space trajectory during the i th motion state, and N R θ i K i is the Non-uniform Fast Fourier Transform (NUFFT) of Sx at the coordinates R θ i K i .We note here that although y i and x are linearly related, ϕ i , θ i and y i are not.
For ease of notation we combine all motion states for the rotation angles, translation distances and intended sampling trajectories into the variables θ, ϕ, and K respectively: To further simplify this expression we combine all unknown motion parameters (θ, ϕ) into κ and get the following expression where A κ includes all linear operators in Eq. 12.We note here that the motion operators (R θ ,P ϕ ) are the same for all coils.
For much of the experimentation in this paper we not only assume constant motion states during a single read out but also fixed motion states for each TR.This is not required for our method to work but it fits with the observation that time between TRs is much longer, in general, than time between readouts within a TR.This is, for example, the case in many fast spin-echo (FSE) imaging sequences.We wish to note that although we explicitly consider rigid body motion in our forward model formulation, non-rigid motion can also be modeled as modifications to the forward model.
However, non-rigid motion requires parameterization of a deformation field which greatly increases problem complexity.

Accelerated Motion Correction with Diffusion Generative Models
As stated above, prior works have shown promising results when using deep generative diffusion models to solve ill-posed inverse problems like sub-sampled image reconstruction (21; 30; 9).In most prior work, however, the forward model is assumed to be fixed and known throughout the reconstruction procedure.In our work, however, we assume that our forward operator A κ belongs to a restricted class of operators with unknown parameters κ which must be learned during inference.
Another way of viewing this problem is as posterior sampling from the joint distribution p(x, κ|y).

Comparison Methods
We display the results for five different methods: 1. Linear Reconstruction Lower Bound (Linear-LB): Conjugate gradient (CG) algorithmbased reconstruction of motion corrupted data with no motion correction (assumes zero motion occurred during scan).We use this as a lower bound for the performance of methods like NAMER (17).
2. Linear Reconstruction Upper Bound (Linear-UB): CG-based reconstruction of motion corrupted data with access to the true motion states.We use this as an upper bound for the performance of methods like NAMER (17) as in the best case, NAMER is a CG reconstruction with access to ground truth measurements of the motion states.

Simulated Motion
To test the robustness of our method at a variety of acceleration levels and sampling patterns, we simulated motion on 100 T2 brain images from the fastMRI dataset (24) for four different sampling patterns at three different accelerations each.Specifically, we use Cartesian and PROPELLER Prior to simulating motion corruption over the raw k-space data we first resized the fully sampled k-space to be 384 × 384.Next we calculated sensitivity maps using ESPIRiT (52).We then applied motion to k-space measurements by drawing random motion states and passing the coil images through the motion-corrupt forward operator.Finally, we added a small amount of noise to the sampled k-space data.We note here that many parts of the preprocessing here constitute an inverse crime (42).However, all competing methods used the same data so our method should not have gained an unfair advantage in this respect.

Prospective Motion
As proof of principle on prospective data we acquired a single T2 brain scan on a healthy volunteer with institutional review board approval and informed consent.The data were collected on a Siemens Vida 3 Tesla MRI scanner at our institution, and we emphasize that the scanner hardware and imaging protocol differed from the fastMRI training data.We first collected a scan at R = 2 where we asked the participant to not move during the scan and consider this our motion free scan.Finally, we collected a scan at R = 3 where we asked the participant to rotate and translate their head (approximately) in-plane during the scan.Scan parameters were: ETL=16, R=3, slice thickness = 4mm, FOV=220mm×220mm, resolution=0.42mm×0.42mm.We applied the LINEAR-LB, PS, and MI-PS methods to compare their reconstructions on prospective data.For MI-PS, instead of estimating one motion state for each TR we estimated separate motion states for each phase encode.

Inference
We selected the following parameters to define the drift and diffusion coefficients in the forward process: s(t) = 1, σ(t) = t with σ max = 5 and σ min = 0.002.For running the reverse ODE we also selected a total number of inference time steps N = 300 and a time step schedule of where ρ = 7.Similar to (8) we select the likelihood weighting . As motion was simulated, we did not wish to unfairly assume a prior over motion states.Therefore we set d Pκ = 0. Finally we used a fixed step size (ξ) for updating motion estimates.
For Cartesian sampling patterns ξ = 1 while for PROPELLER ξ = 0.3 was used.Finally, we found it best to initialize the motion estimates to zero (κ = 0).These details lead to the final update procedure shown in Alg. 3.

Quantitative Evaluation
We evaluate the retrospective results using normalized root mean squared error (NRMSE) and structural similarity index measure (SSIM) on a validation set of 100 images.As there is an ambiguity between two data-consistent reconstructions with a fixed motion offset, we first align the reconstruction to the motion-free ground-truth before evaluating NRMSE and SSIM.

Results
Quantitative NRMSE and SSIM metrics are shown for each simulated motion case in Table 1.
Example reconstructions for simulated motion using Cartesian and PROPELLER trajectories can be seen in in Fig. 3 and 4 respectively.Reconstruction without accounting for motion leads to large error, necessitating the use of motion correction.However, the LINEAR-UB reconstruction, which has access to ground-truth motion parameters performs poorly due to residual aliasing artifacts.
While the E2E method is able to remove aliasing, residual motion artifacts remain, likely because of the differences in motion at test time.Posterior sampling without motion correction is not able to compensate for motion.Finally, MI-PS is able to handle the acceleration as well as the motion through joint posterior sampling.
Using a PROPELLER based acquisition leads to lower error for LIENAR-UB and LINEAR-LB, likely due to the presentation of incoherent artifacts from subsampling and from motion.Interestingly, PROPELLER reconstructions are somewhat worse quantitatively for posterior sampling compared to Cartesian acquisition.This may be due to the heavier subsampling of high frequency k-space.
In Fig. 5 we show the stochastic dynamics of MI-PS over the reconstruction iterations.The image is initialized to all-noise and the motion states are initialized to zero.As the iterations progress, the image and motion jointly converge to a stable solution.The motion states show close alignment with the ground-truth motion states except for a fixed offset as previously discussed.
Finally, we demonstrate a subset of the reconstruction techniques on the prospectively accelerated scan in Fig. 6.We display the final image reconstructions alongside the estimated rigid motion parameters from MI-PS.The motion-free scan was collected independently at a low acceleration and serves as a qualitative baseline.LINEAR-LB demonstrates parallel imaging reconstruction alone, indicating the impact of the motion during the scan.Posterior sampling alone is not able to remove these motion artifacts and thus qualitatively matches LINEAR-LB.Finally, MI-PS is able to remove the motion artifacts by estimating the motion states.Notably, the motion estimates for outer k-space are lower in magnitude compared to the low-frequency k-space points, likely due to poorer estimation due to lower signal-to-noise (SNR) ratio.

Discussion
Deep diffusion generative models have shown to be an extremely powerful advancement for solving inverse problems due to their ability to decouple the measurement model from the prior, which can be well-modeled with deep neural networks.In particular, this means that in contrast to end-toend inverse problem solvers, diffusion priors can be used in a more modular fashion for a variety of imaging problems which vary based on their measurement forward operator (e.g., accelerated reconstruction, super resolution, etc.).In many inverse problem settings, however, the true form of the forward operator used to collect measurements may be unknown as is the case in motion corrupt MRI scans.Several prior methods have approached the problem of motion correction by incorporating the parameterization of motion in the forward operator and jointly optimizing over both the image and motion variables (11; 18; 17).
In a similar fashion, our proposed approach solves this problem by treating the unknown parameterization of the motion as a complementary random variable which must be jointly sampled alongside the clean MR image of interest from motion corrupted measurements (i.e., x, κ ∼ p(x, κ|y)).
This is similar to other recently proposed techniques for solving blind inverse problems (7; 34).This treatment of the motion correction problem enables us to decouple the training of the prior from the motion correction task which allows our method to be transferable between differing sampling trajectories.This is extremely important as the manifestation of motion artifacts in the final image is highly dependent on the sampling trajectory used to collect measurements (27).It is this property of motion artifacts that makes it difficult to generalize end-to-end methods to arbitrary motion corruption.
We see that our method outperforms both end-to-end methods and the best-case linear reconstructions for prior joint optimization techniques (18; 17).This can be owed to the powerful image prior provided by the diffusion generative model which discourages motion estimates that give rise to motion corrupt images at reconstruction time.Although we did not use a trained prior to regularize our motion estimates we implicitly used the prior of enforcing all motion states in the same TR to be the same.We showed on prospectively acquired data that we can drop this assumption and still obtain good reconstruction results.However, it is clear from Fig. 6 that motion estimates for outer k-space differ from those in low-frequency k-space, despite the T2-weighted echo train ordering.This indicates that a more informative prior, e.g. by explicitly grouping the phase encodes in a single ETL, may improve reconstruction quality.
Recently, adaptive end-to-end methods have been proposed under the lens of hybrid networks.
In these settings, motion parameters are estimated and used to choose the weights of the neural network that was trained for those weights (44).Similarly, unrolled methods that explicitly solve for motion using principled optimization have also been proposed (36; 54).These methods may provide a solution to out-of-distribution error, though they still must be re-trained for different sampling trajectories.
We assumed a flat prior on the motion states, which is quite naive.For example, we do not account for the likely causal transition between one motion state and the next.A stronger prior on the motion, for example through a Gaussian process or other Markov chain, could be a flexible way to model such time-dependencies.However, this may not be suitable for rapid and sporadic motion.For these reason, we chose to keep the prior simple.The effects of the design choice should be explored further to determine the potential benefits of richer motion priors.
Our prospective scan result sheds light on the power of our framework when the training set does not match the test set -both in protocol settings as well as scanner hardware.In essence, it is not necessary to retrain the model for every specific MRI protocol.However, we note that we only scanned a single slice, whereas nearly all MRI protocols will collect multi-slice data.While in principle this could be handled by our framework, we did not apply it to multi-slice data.Our framework was also formulated for 2D rigid motion only.Therefore, we are not able to model out-of-plane motion or nonrigid motion.Our framework could be extended to 3D rigid motion estimation by adding additional rotation and translation parameters for each time step.Nonrigid motion could also be handled through proper parameterization of the nonrigid space, e.g.thorough spline interpolation or optical flow.
Posterior sampling is a promising alternative to image reconstruction from subsampled measurements compared to other point estimators.One reason is for the ability to quantify uncertainty in the result.Similar to conventional posterior sampling, joint posterior sampling over image and motion cold also be used to learn uncertainty in the motion parameters.In this work we did not explicitly explore uncertainty in the motion, though it could be interesting for future work.

Conclusion
We proposed a method to correct motion artifacts from highly accelerated MRI by parameterizing motion as inconsistencies in the forward operator which can be jointly estimated as random variables alongside a clean reconstructed MR image.To solve the joint estimation problem we leveraged advancements in deep generative diffusion models to perform posterior sampling.We displayed our proposed technique's ability to correct 2D rigid body motion on both simulated and prospectively corrupted scan data.(Bottom) motion state estimates for each phase encode line using our method.

Figure 1 :
Figure 1: (Left) Motion-free image, (Center) simulated motion-corrupt image resulting from Cartesian fast spin-echo (R = 1), (Right) simulated motion-corrupt image resulting from PROPELLER fast spin-echo (R = 1).Both motion-corrupt images were fully sampled and the same motion states were used for each TR.

3 . 4 . 1 5.
End-to-End Deep Learning (E2E): A GAN-based correction method was trained on image pairs of corrupted and clean images.We use a U-Net(41) and ResNet-18(19) model for the generator and the discriminator networks, respectively.To have better visual quality in the generated images, the training loss includes L1, adversarial, and perceptual components through an Imagenet pre-trained VGG model(43).During training the generator is given motion corrupted images, from a given sampling pattern and acceleration level, as inputs and trained to generate images as close as possible to the clean images.The network is trained with a learning rate of 0.0001 for 10 epochs.The input images are normalized prior to passing through the network.Posterior Sampling (PS): Diffusion based posterior sampling reconstruction of motion corrupted data with no motion correction (i.e., assumes zeros motion occurred during scan).The inference procedure for PS is found in Alg.Motion Informed Posterior Sampling (MI-PS): Diffusion based posterior sampling reconstruction of both image and motion states using Alg. 2.

( 37 )R = 3 , 4 , 5 .
based sampling patterns each at echo train lengths (ETLs) of 8 and 16 for accelerations of All sampling patterns assumed a fully sampled readout of 384 points.Example trajectories for each sampling pattern and ETL are shown at R = 4 in Fig.2.For each TR we simulate a single independent motion state triplet (rotation, x-translation, y-translation).This means that, for example, the case of Cartesian (or PROPELLER) sampling at R = 4 with an ET L = 8 resulted in 12 TRs and thus 12 motion states to estimate along with the corrected image.The motion states for each TR were sampled independently from a uniform distribution (κ ∼ U(−2, 2)).See Fig.2for an example of simulated motion states for a given (R, ET L) pair.

used a batch size of 15
for training the diffusion model and training was done across three A100 GPUs.The E2E method was trained using simulated motion corrupt samples from a Cartesian trajectory with R = 3, ET L = 16.Training for this model was done on a single A40 GPU.

Figure 3 :
Figure 3: Example reconstruction of simulated motion for a Cartesian sampling pattern with R = 4, ET L = 16.From left to right: motion-free fully sampled ground-truth; LINEAR-UB; LINEAR-LB; end-to-end GAN; posterior sampling without motion correction, proposed motion-informed posterior sampling.

Figure 4 :
Figure 4: Example reconstruction of simulated motion for a PROPELLER sampling pattern with R = 5, ET L = 8.From left to right: motion-free fully sampled ground-truth; LINEAR-UB; LINEAR-LB; end-to-end GAN; posterior sampling without motion correction, proposed motioninformed posterior sampling.

Figure 5 :
Figure 5: Progression of image (x) and motion (κ) estimates during inference.Iterations are shown from left to right.Motion estimates and error curves are shown for 2D rigid motion parameters.

Figure 6 :
Figure 6: (Top) Reconstruction results for prospectively accelerated scan at R = 3 for a subset of the tested methods.The motion-free scan was performed at R=2 with linear reconstruction.LINEAR-LB, posterior sampling, and MI-PS were reconstructed from motion-corrupt scan at R=3.

Table 1 :
Simulated Motion Results