Real‐time cardiovascular MR with spatio‐temporal artifact suppression using deep learning–proof of concept in congenital heart disease

Purpose Real‐time assessment of ventricular volumes requires high acceleration factors. Residual convolutional neural networks (CNN) have shown potential for removing artifacts caused by data undersampling. In this study, we investigated the ability of CNNs to reconstruct highly accelerated radial real‐time data in patients with congenital heart disease (CHD). Methods A 3D (2D plus time) CNN architecture was developed and trained using synthetic training data created from previously acquired breath hold cine images from 250 CHD patients. The trained CNN was then used to reconstruct actual real‐time, tiny golden angle (tGA) radial SSFP data (13 × undersampled) acquired in 10 new patients with CHD. The same real‐time data was also reconstructed with compressed sensing (CS) to compare image quality and reconstruction time. Ventricular volume measurements made using both the CNN and CS reconstructed images were compared to reference standard breath hold data. Results It was feasible to train a CNN to remove artifact from highly undersampled radial real‐time data. The overall reconstruction time with the CNN (including creation of aliased images) was shown to be >5 × faster than the CS reconstruction. In addition, the image quality and accuracy of biventricular volumes measured from the CNN reconstructed images were superior to the CS reconstructions. Conclusion This article has demonstrated the potential for the use of a CNN for reconstruction of real‐time radial data within the clinical setting. Clinical measures of ventricular volumes using real‐time data with CNN reconstruction are not statistically significantly different from gold‐standard, cardiac‐gated, breath‐hold techniques.


Magnetic Resonance in Medicine 1 | INTRODUCTION
Cardiovascular magnetic resonance (CMR) is the reference standard method of measuring ventricular volumes in patients with congenital heart disease (CHD). Conventionally, breathhold (BH), multi-slice, short-axis, cardiac-gated, balanced steady-state-free precession (bSSFP) cine imaging is used to acquire this data. 1 Unfortunately, many patients find breath-holding difficult and a rapid free-breathing alternative is desirable.
Real-time imaging offers an alternative to BH imaging, but relies on high levels of data undersampling to ensure adequate spatial and temporal resolution. This means that sophisticated reconstruction techniques are necessary to produce artifact-free images. One example is compressed sensing (CS), which is increasingly used to reconstruct real-time CMR images. [2][3][4] However, there are drawbacks to CS including: (1) Computationally intensive and time-consuming reconstruction, and (2) challenging optimization of reconstruction parameters, which can result in unnatural looking images. Therefore, alternative reconstruction approaches may be useful.
The removal of aliases in images from undersampled MR data can be thought of as a de-noising problem. This is particularly true if the sampling pattern produces incoherent artifacts, as is the case with tiny golden angle (tGA) radial spokes. 5 Recently, convolutional neural networks (CNN) have been shown to be well suited to de-noising and artifact removal problems. [6][7][8] Consequently, it may be possible to use CNNs to remove aliasing from undersampled real-time MR data. This approach is dependent on large amounts of high quality training data, and one potential source is previously acquired, retrospectively cardiac-gated BH-bSSFP cine imaging.
In this study, "synthetic" training data was created from a library of BH-bSSFP cine images and used to train a CNN to map between aliased and artifact-free images (deep artifact suppression). The trained CNN was then used to remove aliases from actual undersampled real-time MR data acquired in prospectively scanned patients with CHD. The aims of this study were to (1) investigate the effect of different radial sampling patterns on the effectiveness of CNN-based reconstruction, (2) acquire actual real-time data in patients with CHD using a tGA radial sampling pattern, and (3) compare reconstruction speed, image quality, and accuracy of ventricular volumes calculated from CNN reconstruction compared to a current state-of-the-art CS algorithm.

| Deep artifact suppression algorithmgeneral concept
The deep artifact suppression algorithm was based on a modified residual U-Net architecture. 9,10 A residual U-Net is a multi-scale CNN where images are sequentially downsampled and then up-sampled to produce an updated and clean version of the artifact-contaminated input image. This particular structure is especially suitable for tomographic problems (such as MR imaging), where the normal operator (projection followed by back-projection) is a type of convolution. 10 The effectiveness of this approach has been shown in several studies for 2D tomographic problems 11,12 and in 2.5D, 6 where the temporal information is processed as 2D channels. In our study, the input was a 3D data set (2D image through time). A 3D structure was used to enforce temporal consistency between frames and prevented the flickering artifact present if a 2D slice-by-slice approach is used (see Supporting Information Video S1).
The residual U-Net was trained with synthetic training data consisting of paired artifact-free "ground truth" magnitude images and corresponding undersampled, artifact-contaminated images. The trained network was then tested with synthetic test data to quantify performance. Finally, the same trained network was used to reconstruct actual real-time data acquired in patients with CHD. These steps are fully explained below. The use of retrospectively collected training and test data, as well as collection of prospective real-time data, was approved by the local research ethics committee, and written consent was obtained from all subjects and/or guardians (National Research Ethics Service Committee reference: 06/Q0508/124).

| Preparation of synthetic training data
The synthetic training data was created from 2D short axis stacks of Cartesian, retrospectively cardiac-gated, breathhold, multi-slice, short-axis, bSSFP cine images (BH-bSSFP). The imaging parameters were: matrix = ~240 × 204, FOV = ~320 × 265 mm, phase partial-Fourier 6/8, spatial resolution = ~1.3 × 1.3 mm, slice thickness = ~10 mm, temporal resolution = ~32 ms, reconstructed cardiac phases = 40. Training data was collected from 250 previously scanned children and adults with pediatric heart disease or CHD (mean age: 22.3 ± 12.6 y, male: 140). No data from patients with single ventricles or images with breathing or arrhythmia artifacts were included in the synthetic training data. Full details of patient diagnoses are supplied in Supporting Information Text S1. On average, each patient had ~9 2D slices, resulting in 2276 paired 3D (2D plus time) data sets for training.
The first step in creating training data was to convert the retrospectively cardiac-gated data into synthetic real-time data that mirrored the acquisition parameters of the real-time sequence (as stated below). To provide similar spatial resolution, images were linearly resampled onto a 192 × 192 matrix, reducing the spatial resolution from 1.3 mm to 1.7 mm. This was followed by linear resampling in time onto points that were 36.4 ms apart (temporal resolution of the real-time sequence). The total number of temporal frames equaled the floor of the R-R wave | 1145 HAUPTMANN et al.

Magnetic Resonance in Medicine
divided by 36.4 ms. The resulting data formed the ground truth images necessary for training.
To create undersampled, artifact-contaminated training data, the synthetic real-time data was Fourier transformed and gridded onto a 13× undersampled radial trajectory (see below). The undersampled data was then regridded and inverse Fourier transformed back into image space. Both the ground truth and artifact-contaminated images were cropped to a 128 × 128 matrix to constrain the learning problem to the anatomy of interest (heart). The ground truth and artifact-contaminated images were also linearly interpolated through time to give 20 frames, because all inputs to the network had to be the same dimensions. Finally, each 3D data set was also normalized to have signal intensities in the range [0, 1]. All processing required for creation of the synthetic training data was performed in MATLAB (The MathWorks, Natick, MA). A flow diagram of the process is included in Supporting Information Figure S1.

| Assessing the effect of sampling strategy
Radial sampling was chosen for this study, because it is known to produce less pronounced undersampling artifacts compared to Cartesian acquisitions. 13 We further hypothesized that a continuously rotating tiny golden angle (~23.63°) approach may be optimal for a residual U-Net reconstruction because of the noise-like characteristics of the aliases. Although tiny golden angles spacing has been shown to reduce eddy current effects and optimize image quality, 13 the effect of specific sampling patterns on deep artifact suppression had not been investigated. Therefore, we compared the continuously rotating tiny golden angle sampling (tGA rot ) to 3 other sampling patterns (see Figure 1): (1) a tiny golden angle scheme with no rotation between frames (tGA no_rot ), (2) a regular angle scheme with regular rotation between frames (REG rot ), and (3) a regular angle scheme with no rotation between frames (REG no_rot ). Figure 2 shows both the sampling pattern and their corresponding artifacts.

| Residual U-Net architecture and training
Four residual U-Nets were trained, one for each of the radial sampling patterns. The synthetic training data for each U-Net consisted of paired artifact-free ground truth magnitude images and the corresponding undersampled, artifactcontaminated images (different for each sampling pattern). The architecture of the residual U-Net relied on a multi-scale decomposition of the input image and a skip connection at each scale (see Figure 3 for the chosen architecture). Each convolutional layer had a filter size of 3 × 3 × 3 and was equipped with a rectified linear unit as nonlinearity, except the last layer that produced the residual update. The filters were equally weighted in spatial and temporal domain, and hence, no directions were favored in the training process. The output of the network was the input artifact-contaminated image with the residual update, and the result was projected to positive numbers by a rectified linear unit to enforce non-negativity.
Implementation and training of the residual U-Net was done in Python with TensorFlow. 14 We minimized the 2-loss of the reconstructed volume to the desired ground truth. The training was done for 350 epochs with the adaptive moment estimation algorithm (ADAM), 15 with an initial step size of F I G U R E 1 The 4 radial sampling patterns tested in this study. Fully sampled k-space requires 182 uniformly spaced radial spokes. An acceleration factor of 13 was used, resulting in 14 radial spokes acquired per frame, regardless of sampling pattern. REG no_rot , regular angle scheme (equal angular spacing of ~12.9 o between each profile) with no rotation between frames. REG rot , regular angle scheme with regular rotation (of ~0.99 o ) between frames so that 13 consecutive frames constitutes a fully sampled k-space. tGA no_rot , tiny golden angle scheme with no rotation between frames. tGA rot , continuously rotating tiny golden angle sampling [Colour figure can be viewed at wileyonlinelibrary.com] Magnetic Resonance in Medicine 10 −3 and batches of 8 samples. The total training time for each network took 12 h on a Titan XP GPU with 12 Gb memory.

| Evaluation of synthetic test data
The performance of the 4 separately trained networks (1 for each sampling pattern) were tested using a synthetic test data set. This data set was created from BH-bSSFP data, previously acquired in 25 children and adults with pediatric heart disease or CHD (mean age: 24.3 ± 13.3 y, range: 8-64 y, male: 13) that were not used to create the synthetic training data (full list of diagnoses in Supporting Information Text S1). Preparation of the synthetic test data was the same as for the training data. The artifact-contaminated magnitude data for each of the sampling patterns were input into their correspondingly trained networks. The output data were compared quantitatively and qualitatively to the ground truth images. Quantitative assessment was carried out using root-mean-square error (RMSE) and Structural Similarity Index (SSIM). The results of these analyses were averaged over the cardiac phases and slices for each patient. Qualitative assessment was performed using all cardiac phases from the middle slice of each patient. The output images for each sampling pattern were compared to the corresponding ground truth images by a CMR specialist (V.M., with 15 y experience) and evaluated for: (1) absence or deformation of papillary muscles or trabeculations, (2) poor motion fidelity, and (3) blurring of the endocardial border.

| Assessment of robustness
The chosen protocol parameters and the level of noise in the artifact-contaminated images may affect the performance

Magnetic Resonance in Medicine
of the final trained network. The robustness of the resultant tGA rot network was assessed by modifying the synthetic realtime test data, in terms of (1) SNR, (2) acceleration factor, and (3) image cropping.
The effect of SNR was assessed by artificially adding Gaussian noise to the artifact-contaminated tGA rot synthetic test images, varying the resultant SNR between 20 dB to 10 dB. To assess the effect of acceleration factor, the synthetic real-time data was undersampled with acceleration from 10× to 16×. The robustness to position of image cropping, was assessed by shifting the cropped region by 0/ ± 4/ ± 8/± 12 pixels, in the x and/or y direction (giving 48 shifted regions for each subject). Resultant images were reconstructed using the previously trained network (tGA rot with 13× acceleration) and were compared to the ground truth images using RMSE and SSIM.

| Actual in vivo study
Actual real-time data was prospectively acquired in 10 children and adults with pediatric or CHD referred to our center for clinical CMR (mean age: 33.6 ± 16.8 y, range: 16-64 y, male: 3) (full list of diagnoses in Supporting Information Text S1). Exclusion criteria were (1) inability to breath-hold, (2) a single ventricle, and (3) arrhythmia. All patients were imaged on a 1.5T MR scanner (Avanto, Siemens Medical Solutions, Erlangen, Germany) with VCG gating. Real-time undersampled radial data was acquired, as described below. In addition, reference standard assessment of ventricular volumes was performed using standard BH-bSSFP cine imaging in the short axis (acquisition parameters same as parameters of the training data; matrix = ~240 × 204, FOV = ~320 × 265 mm, phase partial-Fourier = 6/8, spatial resolution = ~1.3 × 1.3 mm, slice thickness = ~10 mm, bandwidth = ~940 Hz/pixel, flip angle = ~70 o , TE/TR = ~1.2/2.4 ms, temporal resolution = ~32 ms, reconstructed cardiac phases = 40). Ten to 15 contiguous slices were acquired in the short axis to ensure coverage of the whole ventricular volume, with 1 slice acquired per breath hold (~6 s).
In addition, actual real-time data was prospectively acquired in 1 healthy volunteer to assess the effect of rapid or large respiratory motion on the performance of the final trained network. All imaging parameters were the same as above. Real-time data were acquired during breath-holding, normal breathing, and rapid, deep breathing.

| Real-time sequence
The new real-time sequence was based on an in-house single-shot multi-slice 2D radial bSSFP technique. Imaging parameters were: matrix = 192 × 192, FOV = 320 × 320 mm, spatial resolution = 1.67 × 1.67 mm, slice thickness = 10 mm, bandwidth = 1240 Hz/pixel, flip angle = 67°, TE/TR = 1.4/2.8 ms. Consecutive spokes were separated by the tiny golden angle, and 14 spokes were used to reconstruct each frame (no temporal overlap between frames). This represented an undersampling factor of 13×, based on 182 spokes being required to fully sample k-space. 16 The resultant temporal resolution was 36.4 ms. Acquisition of each slice required 2 R-R intervals; the first R-R interval being used to reach the steady state and the second for data acquisition. Ten to 15 contiguous slices were acquired in the short axis and after acquisition the imaging plane was moved to the next contiguous slice. Image acquisition was performed during free-breathing, using two spine coils and one body-matrix coil (giving a total of 12 coil elements).

| Reconstruction of real-time data
Creation of artifact-contaminated images from the undersampled radial data was performed online in Siemens reconstruction environment (ICE VB17, two AMD Opteron processors, clockspeed: 3.2 GHz, cache: 6 MB). The k-space data was gridded, Fourier-transformed into image space, followed by weighted sum over the coil elements. The resultant artifactcontaminated images were processed in MATLAB similarly to the test data; cropped to 128 × 128 matrix, followed by temporal interpolation to 20 frames. The resultant images subsequently had the artifact suppressed using the previously trained tGA rot residual U-Net.
The undersampled radial data was also reconstructed using the CS algorithm, golden angle radial acquisition with temporal total variation sparsity regularization (GRASP). 17 To provide an unbiased comparison in terms of image quality and reconstruction time, we used the Berkeley Advanced Reconstruction Toolbox (BART) 18 as it supports parallel computation using GPUs. This toolbox uses the ADMM algorithm for temporal total variation constraints. 19 The number of iterations was set to 50 as a good trade-off between reconstruction speed and reconstruction accuracy (determined from preliminary experiments described in Supporting Information Figure S2). Coil sensitivity information was estimated from the k-space centre. 20 The regularization level was selected empirically (to 0.025) based on the first 5 patients to minimize artifacts without producing too much temporal blurring. The GRASP algorithm was performed on the same computer as the residual U-Net (Titan XP GPU with 12 Gb memory).

| Analysis of in vivo data
Image quality and ventricular volumes were compared between the gold-standard BH-bSSFP images and the real-time radial data reconstructed with the GRASP algorithm and the deep artifact suppression residual U-Net.
Quantitative edge sharpness was calculated by measuring the maximum gradient of the normalized pixel intensities across the border of the septum, as previously described. 21 To reduce noise, which results in artificially high gradients (representing sharp edges), the pixel intensities were fit to a tenth order polynomial, before differentiation. Edge sharpness was calculated in 6 positions across the septum for all cardiac phases, and the average value was used for comparison.

| Ventricular function
Quantification of left and right ventricular volumes was calculated for each technique. The end-diastolic and endsystolic phases for each ventricle were chosen through visual inspection of the mid-ventricular cine. At end diastole and systole, all ventricular sections were manually segmented by a CMR specialist (V.M.). The endocardial border was traced using the OsiriX open source DICOM viewing platform (Osirix v.9.0, OsiriX foundation, Switzerland). 22 The papillary muscles and trabeculae were included in the blood-pool.

| Statistics
Statistical analyses were performed by using STATA software (STATA SE, v.14.2). Qualitative image scores for synthetic test data are expressed as percentage of cases where the artifact was visible. Comparisons of continuous variables was performed using one-way repeated measures analysis of variance (ANOVA) with post hoc testing using Bonferroni correction for significant results. Comparison of ratio data (image scoring of synthetic test data) was performed with the chi-squared test and comparison of ordinal data (image scoring of clinical data) was performed using the Wilcoxon signed-rank test. For assessment of agreement of ventricular volumes and function, the BH-bSSFP data was used as the reference standard for Bland-Altman analysis. A P-value of <0.05 indicated a significant difference. Figure 4 shows image quality of the synthetic test data for the different sampling patterns (corresponding movie can be seen in Supporting Information Video S2). Table  1 shows the RMSE and SSIM results for the different radial sampling patterns. The tGA rot had significantly lower RMSE (better reconstruction accuracy) than all of the other sampling patterns (P < 0.0001) and significantly higher SSIM (better reconstruction accuracy) than all the F I G U R E 4 Examples images from 2 "synthetic real-time" test data sets reconstructed from the 4 different sampling patterns, using correspondingly trained residual U-Net's. Pt1 shows total loss of some papillary muscles (indicated with arrow) compared to the truth data, with all trajectories except tGA rot . Pt2 shows blurring across the ventricle, particularly seen with the non-rotating trajectories (see Supporting Information Video S2 for corresponding movie)

Magnetic Resonance in Medicine
other sampling patterns (P < 0.0001). Qualitative measures of the images from the different sampling patterns are also shown in Table 1. The tGA rot had significantly lower proportions of absent/deformed papillary/trabeculations, poor motion fidelity, and blurred endocardial borders compared to tGA no_rot and REG no_rot (P < 0.0001). In addition, tGA rot had significantly lower proportions of absent/deformed papillary/trabeculations (P = 0.018) and poor motion fidelity (P = 0.045) than REG no_rot .

| Assessment of robustness
For the tGA rot network, as SNR decreased from 20 dB to 10 dB, the SSIM decreased by 6.8% (0.862 at 20 dB, 0.794 at 10 dB, and 0.868 with no additional noise), and the RMSE increased by 1.3% (0.043 at 20 dB, 0.056 at 10 dB, and 0.042 with no additional noise) (see Supporting Information Figure S3). As acceleration factor increased from 10× to 16×, SSIM decreased by to 6% (0.89 at 10× and 0.83 at 16×) and RMSE increased by 0.9% (0.041 at 10× and 0.050 at 16×) (see Supporting Information Figure S4). The 13× undersampled data appeared to perform better than this trend, probably because the network was trained on 13× undersampled data. Altering the position of image cropping had little effect on SSIM and RMSE. The lowest SSIM was 0.858 at a shift of +12 pixels in x and y, and was 1.04% less than with no shift applied. The highest RMSE was 4.53 × 10 −2 also at a shift of +12 pixels in x and y, and was 0.34% greater than with no shift applied. See Supporting Information Figure S5 for all results.

| Prospective in vivo study
Real-time data was successfully acquired and reconstructed (GRASP and residual U-Net) in all 10 patients. On average, there were ~12 slices per patient, resulting in 122 3D data sets. Reference standard BH-bSSFP data was also successfully acquired in all patients. Total acquisition time for BH-bSSFP stack was 279 ± 65 s, and for the real-time radial bSSFP, it was 18 ± 3 s.

| Reconstruction time
Overall reconstruction times for all slices (~12 slices, from raw data file until completion of reconstruction) was ~5.6× faster for deep artifact suppression compared to compressed sensing (~22.0 s vs. ~111.4 s).
The compressed sensing reconstruction took ~3.4 s to read the raw k-space data file (containing all phases, for all slices, with 12 coil elements), and the GRASP algorithm (including calculation of trajectories and coil sensitivity information) took ~8.9 s per slice.
For the deep artefact suppression, creation of artifactcontaminated DICOM images took ~14.8 s for all slices (~1.3 s per slice, including reading of the raw k-space data file, gridding of data, coil combination, and saving to DICOM). Deep artifact suppression of these images took ~7.2 s for all slices, including ~5.4 s to read all DICOM files (all phases, for all slices), before initialization of variables for each slice and running the residual U-Net (~0.15 s per slice).

| Image quality
Representative images are shown in Figure 5 and the corresponding movie can be seen in Supporting Information Video S3. The residual U-Net reconstruction of the real-time data had better image quality than the same data reconstructed using GRASP. In particular, there is still significant residual aliasing seen in the GRASP reconstruction, as well as greater temporal blurring (as seen in the x-t plots of Figure 5). There are some subtle differences between the real-time reconstructions and the BH-bSSFP images (i.e., reduced pericardial fat). These are probably because of minor slice position variations between the free-breathing and breath-holding acquisitions. These differences between the U-Net and GRASP reconstructions are reflected in the qualitative image scoring (Table 2), with the residual U-Net reconstruction having similar myocardial delineation, motion fidelity, and artifact scores to the BH-bSSFP images and superior scores to the

Magnetic Resonance in Medicine
GRASP reconstructed data (P < 0.05). Quantitative edge sharpness ( Table 2) was higher in the BH-bSSFP images, followed by the residual U-Net reconstruction and then GRASP reconstruction, with all comparisons being significant (P < 0.007). Figure 6 shows images from the single volunteer under different respiratory conditions and the corresponding movie can be seen in Supporting Information Video S4. It can be seen that the U-Net reconstruction appears robust to increasing respiratory motion. This is in contrast to the GRASP reconstruction that is compromised during deep rapid breathing, particularly at end diastole. Table 3 (with all Bland-Altman graphs for the LV in Supporting Information Figure S6 and for the RV in Supporting Information Figure S7). The limits of agreement for both the residual U-Net and GRASP reconstructions were similar. However, there were several significant

Magnetic Resonance in Medicine
biases with the GRASP reconstructed data including left ventricular ESV (P = 0.04) and EF (P < 0.001) and right ventricular EDV (P < 0.001). The residual U-Net reconstructed data was only associated with a significant bias for right ventricular EDV (P = 0.004), which was still less than the GRASP data (P = 0.036). The Bland-Altman graphs for LV ESV, LV EF, and RV EDV are shown in Figure 7.

| DISCUSSION
The main findings of this study included (1) it was possible to train a 3D residual U-Net to remove artifacts (deep artifact suppression) in synthetic radially undersampled data created from previously acquired Cartesian BH-SSFP images, (2) the ability to perform deep artifact suppression on synthetic test data was determined by the simulated undersampling pattern,

Magnetic Resonance in Medicine
(3) deep artifact suppression of actual radial real-time data acquired in new patients was successful with a residual U-Net trained using synthetic data, (4) total reconstruction times were significantly shorter (>5×) using a residual U-Net compared to a conventional CS reconstruction, and (5) image quality and measurement of ventricular volumes was superior using the residual U-Net compared to CS reconstructions of the same raw data.

| Deep artifact suppression and synthetic training data
Machine learning, especially deep learning, relies on large amount of training data. Most clinical CMR services have large amounts of images stored from previously scanned patients. However, there are 2 problems with using this type of data to train deep learning algorithms capable of reconstructing real-time data. The first is that the CMR data is usually stored as magnitude images rather than complex raw data. Simple magnitude data is not suitable for training the iterative deep learning methods that have previously been used for CMR reconstruction. 7,8,[24][25][26] Therefore, in this study, we chose to use a residual U-Net to remove aliasing from artefact-contaminated gridded images. This approach has previously been shown to be successful in removing aliasing in retrospectively undersampled dynamic Cartesian CMR data. 6 We have built on this previous work by using a much larger training data set and implementing non-Cartesian undersampling. The second problem with using previously acquired data is that it does not consist of paired ground truth and undersampled real-time data. Therefore, retrospectively cardiac-gated, Cartesian, breath-hold bSSFP images were used to create both synthetic training and test data. The advantage of using synthetic data is that the ground truth is known. The disadvantage of using synthetic data is that errors during acquisition (such as gradient delays or trajectory errors) and respiratory motion (because of the free-breathing acquisition for real-time imaging) cannot be corrected for. Nevertheless, we were able to show that this synthetic data could be used to train a residual U-Net to successfully perform deep artifact suppression. Interestingly, the reconstruction accuracy of the synthetic test data was highly dependent on the radial undersampling pattern used. Non-rotating trajectories, where the same spokes were used in all frames, resulted in significant quantitative errors (as measured by RMSE and SSIM) and qualitative errors. Specifically, the images reconstructed for non-rotating trajectories were associated with missing or deformed papillary muscles and/or trabeculations in over 88% of synthetic test cases. Furthermore, the non-rotating trajectories also produced images with the worst motion fidelity and blurring of the endocardial border. In contrast, the continuously rotating strategies performed better, probably because of aliases having a less coherent structure through time. In particular, tGA rot produced aliases that appeared noise-like and we believe this is the reason for the superior results seen with this sampling pattern. A full description of the association between the aliasing patterns and reconstruction accuracy are outside the scope of this article. Nevertheless, the superior quality of tGA rot sampling for the synthetic test data confirmed that it was the right choice for the actual in vivo part of this study.

| In vivo study
We demonstrated clinical translation of the residual U-Net by successfully performing deep artifact suppression on actual real-time images acquired in new patients. Importantly, because a tGA rot pattern was chosen, it was possible to reconstruct the same raw data using a compressed sensing algorithm (i.e., the GRASP technique). One of the hypothetical advantages of a deep learning approach over CS is quicker reconstruction times. In this study, we found that residual U-Net reconstruction was >5× quicker than the GRASP reconstruction, when T A B L E 3 Ventricular volumes from prospective in vivo data comparing the BH-bSSFP sequence, as well as the real-time radial sequence, with GRASP and residual U-Net reconstructions

Magnetic Resonance in Medicine
both were run on the same GPU. It should be noted that the reconstruction times for GRASP are almost linearly dependent on the number of iterations. Therefore, reconstruction time could be easily shortened by simply reducing the number of iterations. However, this would reduce image quality of the GRASP reconstruction, which is already lower than the residual U-Net reconstruction was performed on cropped images, however we would not expect this to have significant

Magnetic Resonance in Medicine
effect on the U-Net reconstruction. In addition, the U-Net residual U-Net reconstruction was performed on cropped images, however we would not expect this to have significant effect on the evaluation time. Consequently, we believe that by accelerating reconstruction, deep learning approaches could play an important role in clinical translation of heavily undersampled real-time techniques.
In this study, we also found that the residual U-Net reconstructed images provided more accurate quantification of ventricular volumes compared to GRASP reconstructed images. The reason for this was probably better image quality, as reflected by better qualitative scores for endocardial border sharpness, temporal fidelity of wall motion, and residual artifacts. We believe that the superior image quality of the U-Net reconstruction was the result of using a high quality, large volume training data set. Conversely, the poorer CS reconstructions could be because of sub-optimal choice of the regularization parameter. However, the presence of both residual artifacts and poor edge sharpness and/or temporal fidelity suggests that changing regularization would be of little benefit. In fact, the bias and limits of agreement of the GRASP reconstructed data were better than previous studies 4 using the same technique. This suggests that the GRASP implementation was adequately optimized. Therefore, we believe that the residual U-Net reconstruction is superior and may allow more accurate measurement of ventricular volumes.

| Study limitations
The main limitation for this study is that the training data was acquired during a breath-hold, while the real-time data was acquired during free-breathing. This means that the network cannot "learn" anything about respiratory motion and may not be able to correct for this additional motion. However, we did perform an experiment in a single volunteer that demonstrated that the U-Net reconstruction was relatively robust to even large respiratory perturbations. This suggests that the network learns the structure of the noise to be removed, rather than the expected motion of the underlying structures. Nevertheless, further work is necessary to fully understand the effect of motion not apparent in the training data.
In this study, we also did not investigate the full diversity of CHD. Specifically, no patients with single ventricles were included in the training data set or the in vivo study. Even though this first study shows that the trained network generalizes well, the full generalization capability of the network to varying motion and anatomies should be investigated in future work.
Another drawback of our implementation of a deep artifact suppression network was that all the training and subsequent data needed to be of the same size. However, when acquiring real-time data over 1 cardiac cycle, the number of frames is dependent on the patient's heart rate. This meant that it was necessary to resample the aliased images through time, altering the temporal aliasing pattern of the acquired data. However, as both the training and actual real-time data were processed in the same way, we do not believe that it is a significant limitation.
A further limitation of our approach was that the training and actual input data consisted of coil-combined magnitude images, rather than raw multi-coil complex data. The main benefit of this approach was that previously acquired data that was easily retrievable from a conventional clinical image archive could be used for training. However, the absence of phase data in our approach may prevent optimum image restoration, in particular dealing with signal cancellation because of aliasing. Although this was not noticeable in the in vivo data, this issue warrants further investigation in future studies. In addition, we did not investigate the effect of dark band artefacts in either the training data or actual input data on image quality. This study was performed at 1.5T, and dark band artefacts were not a significant problem. Nevertheless, this issue must be investigated if our approach is to be considered for higher field strengths.
Finally, in this study, we choose to minimize the 2-loss and do not fully explore other loss functions such as 1-loss or SSIMloss. The 1-loss function has been shown to provide superior training or improved results in image restoration, 27 super resolution, 28 and MR image reconstruction. 29 Therefore, we performed preliminary tests comparing the 1-loss and 2-loss using the tGA rot trajectory. Assessment of image quality from the trained networks using the synthetic test data, described above, demonstrated <0.01% difference in RMSE and <0.5% difference in SSIM (further information in Supporting Information Figure S8). This is similar to previous deep learning studies, which have shown limited differences in RMSE and SSIM values using 1-loss or SSIM-loss functions, however, with improved visual quality compared to 2-loss. 6,30 Nevertheless, further investigation of different loss functions and other optimizations of learning would be desirable in the future.

| CONCLUSION
This article demonstrates the potential of using a residual U-Net for deep artifact suppression of real-time radial data within a clinical setting. Once the networks have been trained, the reconstruction times are very short, making these techniques particularly appealing within busy clinical workflow. We have shown that ventricular volumes measured from images reconstructed using a residual U-Net are not statistically significantly different from the reference standard, cardiac gated, BH techniques. Therefore, we believe that this technique may help with full adoption of real-time CMR in clinical practice. This would be particularly useful in children and sick patients who are unable to hold their breath.

SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article.

FIGURE S1
Flow diagram showing the steps taken to convert the retrospectively cardiac-gated, BH-bSSFP data to synthetic real-time data used to train the residual U-Net shows mean difference, dashed lines shows +/-2 standard deviation FIGURE S7 Bland-Altman plot comparing right ventricular volumes acquired with BH-bSSFP to real-time data reconstructed with GRASP and residual U-Net. Solid red line shows mean difference, dashed lines shows ±2 SD FIGURE S8 Images showing difference related to training with L1 and L2 cost function, and table showing RMSE and SSIM compared to the ground truth VIDEO S1 Movie showing residual flickering artifact when a 2D U-Net is applied to dynamic data. Left: images with residual aliasing used as input to U-Net. Right: output from 2D U-Net. The flickering artifact is not seen with a 3D U-Net (2D plus time), because it is used in this study VIDEO S2 Example movies from 2 "synthetic real-time" data sets reconstructed from the 4 different sampling patterns using correspondingly trained residual U-Nets. Pt1 shows total loss of some papillary muscles compared to the truth data, with all trajectories except tGA rot . Pt2 shows blurring across the ventricle, particularly seen with the non-rotating trajectories VIDEO S3 Example movies from 2 prospective patients from the BH-bSSFP sequence and the real-time radial sequence reconstructed with GRASP and the residual U-Net VIDEO S4 Example movies from single volunteer acquired during different respiratory maneuvers: (left) breath-hold, (middle) normal breathing, (right) deep rapid breathing. Top row is GRASP reconstruction, and the bottom row is the Unet reconstruction TEXT S1 Diagnosis of patients used for training of the residual U-Net, creating of the synthetic test data, and for the prospective in-vivo study