Deep‐learning based super‐resolution for 3D isotropic coronary MR angiography in less than a minute

To develop and evaluate a novel and generalizable super‐resolution (SR) deep‐learning framework for motion‐compensated isotropic 3D coronary MR angiography (CMRA), which allows free‐breathing acquisitions in less than a minute.


| INTRODUCTION
Three-dimensional (3D) whole-heart coronary MR angiography (CMRA) has shown significant potential for diagnosis and characterization of coronary artery disease (CAD) without radiation exposure or the need for intravenous contrast. [1][2][3][4] The acquisition of 3D whole-heart CMRA remains lengthy, as high-resolution (HR) data need to be acquired for accurate quantification of luminal stenosis.
Conventional free-breathing diaphragmatic-navigated CMRA 5,6 suffers from long and unpredictable scan times as only a small portion of the respiratory cycle is used for imaging, subsequently leading to a reduced scan efficiency. Alternatively, 1D self-navigation 7,8 or image-based navigators 9-11 enable 100% respiratory scan efficiency by retrospectively correcting translational-induced respiratory motion. These sequences achieve a reduction of ~50% in scan time to around ~13-25 min for a 1.2 mm 3 isotropic 3D CMRA in contrast to a scan time of ~25-50 min with diaphragmatic navigators. 12 Further scan time reduction was achieved by accelerating the acquisition with parallel imaging, [13][14][15] compressed sensing (CS) [16][17][18][19][20][21] or low-rank [22][23][24][25] techniques. These methods were able to provide 3D isotropic sub-millimeter resolution scans within ~5-10 min scan time. In the reconstruction, sparsity-promoting and non-linear reconstructions have been used to recover an aliasing-free and high-quality image operating on an image 17,18,26 or patch-level. 22,24,25 Techniques exploiting redundancies on various scales and low-rank matrix structures have been shown to outperform the conventional CS image reconstruction by providing better image details and edge recovery, which is essential for the depiction of the small coronary arteries. 22,27 Incorporation of motion correction into the reconstruction helps increase information exchange and provides motion-compensated CMRA. 27,28 Despite these promising advances, the reconstruction times of iterative methods are long. 27 Careful fine-tuning between sparsity-promoting regularization and data consistency is required and especially for high undersampling factors residual aliasing may remain in the image (underregularization) or staircasing and blurring artifacts can occur (over-regularization). Hence, achievable image quality will depend on scan-specific parametrization and image acceleration is limited by the fixed sparsity-promoting reconstruction.
Recently, deep-learning-based MR image undersampling reconstruction methods have been proposed to address some of the aforementioned shortcomings. [29][30][31][32][33] A deep neural network that is trained on a cohort of data samples provides a data-adaptive scheme that can reconstruct images within a few seconds during inference. These methods primarily differ in terms of their intended task, that is, image denoising performing image-to-image regressions [34][35][36] with combination of physics-based learning, as for example, plug-and-play priors [37][38][39] ; direct mapping from acquired k-space to the reconstructed image 40 ; and physics-based k-space learning 41 or unrolled optimizations. [42][43][44][45][46][47][48][49] Very few of these developments have been applied to 3D CMRA reconstruction. 44 Furthermore, obtained spatial resolution in CMRA is still limited compared to computed tomography coronary angiography (CTCA) and scan time remains relatively long.
An alternative approach for accelerating the image acquisition while simultaneously increasing spatial resolution is the use of deep-learning-based super resolution (SR). 31 Images are acquired at a low-resolution (LR; with or without undersampling) and retrospectively reconstructed to the HR target in a singleimage SR setting. 50 SR reconstructions have been studied for neuro, [51][52][53] cardiac CINE, 54,55 whole-heart, 56-60 fetal, 61 diffusion, 62 or diffusion tensor MRI. 63 These previously proposed methods either operate on residual learning to super-resolve pixels on a LR input image, which was upsampled to target resolution [51][52][53]59 or super-resolve from the LR input directly. 54,57 These methods rely on information sharing along 2D/3D spatial 51,53,57 or temporal dimensions 54,55 at the image 53,54,60 or patch level. 51,52,57 The objective of this work is to enable free-breathing 3D CMRA in less than a minute scan time. Therefore, we propose a single-image SR framework combined with non-rigid respiratory motion compensation. In contrast to previous methods, we propose and investigate a deep-learning generative adversarial SR network (SRGAN) that generalizes to (a) different input resolutions and can (b) operate on an image or patchlevel. The SRGAN applies a trainable discriminator and perceptual loss network. We analyze the proposed SRGAN method for a 16-fold increase in spatial resolution to obtain HR CMRA (0.9 mm 3 or 1.2 mm 3 ) from LR acquisitions (0.9 × 3.6 × 3.6 mm 3 or 1.2 × 4.8 × 4.8 mm 3 ). The supervised SR-CMRA was trained on 47 patients and testing was done on data from 15 patients (five cross-validations with 3 patients each from the cohort of 50 patients) (retrospectively Conclusion: The proposed SR-CMRA provides a 16-fold increase in spatial resolution with comparable image quality to HR-CMRA while reducing the predictable scan time to <1 min.

| METHODS
The proposed SR-CMRA framework is depicted in Figure 1. The SRGAN follows the concept of residual learning and consists of two cascaded Enhanced Deep Residual Network for SR (EDSR) 64 generator, a trainable patch discriminator 65 and a pre-trained VGG-16 66 perceptual loss network. The input to the SRGAN network is a coil-combined and non-rigid motioncorrected 27 LR-CMRA image (0.9 × 3.6 × 3.6 mm 3 or 1.2 × 4.8 × 4.8 mm 3 ) which gets super-resolved by a factor of four along each phase-encoding direction (phase and slice dimensions) to the HR and motion-corrected image output (0.9 mm 3 or 1.2 mm 3 ).

3D CMRA
The acquisition and motion-compensated reconstruction process of the 3D CMRA images is shown in Figure 2. Echocardiograph (ECG)-triggered 3D whole-heart Cartesian balanced steady-state free precession (bSSFP) CMRA was acquired under free-breathing in coronal orientation with a variable-density spiral-like Cartesian (VD-CASPR) sampling. 22,67 The k y − k z phase-encoding plane is sampled along a spiral-like arm with golden angle increment between consecutive spiral-like branches. For accelerated acquisitions, a fully sampled k-space center with a size of 20% of the k y * k z phase-encoding directions was acquired to allow coil sensitivity map estimation. The acquisition of each spirallike arm was triggered to mid-diastole and preceded by T2preparation, spectral-selective fat suppression pulses and 2D image navigators which allow a beat-to-beat 2D translational respiratory motion estimation (in superior-inferior and rightleft direction) and enable a 100% respiratory scan efficiency.
In the reconstruction, the acquired data are binned into five respiratory gates following the 2D iNAV extracted respiratory superior-inferior trace and corrected to the center of each bin (intra-bin motion correction) to allow for 3D bin-to-bin non-rigid motion estimation by an elastic spline registration. Motion fields are used in a motion-compensated reconstruction. 27 For the retrospective cohort (used in training and testing) a 3D patch-based low-rank reconstruction (PROST), 22 which exploits redundant information on a local (within a patch) and nonlocal (similar patches within a spatial neighborhood) scale in the respiratory motion-aligned data 23 is used. For the prospective cohort (used only in testing), a non-rigid motion-compensated iterative SENSE reconstruction 27,68 was performed in-line on the scanner. The reconstructed coilcombined and respiratory motion-corrected complex-valued images are taken as input for the SR reconstruction.
The generator consists of two cascaded EDSR networks. 64 Each stage performs a two-fold upsampling in each phaseencoding direction in eight consecutive residual blocks with 3 × 3 convolution filters of stride 1 and 64 kernels followed by ReLU activation functions. Residual connections strengthen the residual learning capability of the generator. The trainable discriminator network 65 is built as a patchbased convolutional neural network to capture local discrepancies between super-resolved output and HR target in training. A dyadic kernel increase and alternating stride of 1 and 2 is chosen to extract and focus local features. A VGG-16 network 66 serves the purpose to capture perceptual differences. The discriminator and the VGG-16 network are both pre-trained and fine-tuned on a cohort of CTCA patients to train domain-aware knowledge for retrieving adversarial and perceptual information. In total, the generator (~1.1 million parameters) and discriminator (~5.2 million parameters) network consist of ~6.3 million trainable parameters.
The input to the network is the magnitude fully sampled LR-CMRA (0.9 × 3.6 × 3.6 mm 3 or 1.2 × 4.8 × 4.8 mm 3 ), whereas the output is the corresponding isotropic SR-CMRA image (0.9 mm 3 or 1.2 mm 3 ). The input can be either an axial 2D patch with a size in the range of 10%-90% of the full image slice or an axial 2D image slice, that is, a single 2D patch of 100% size. For training, an equal split between patch and image input was chosen whereas for patches the location and size was randomly selected in the range of 10%-90%. The loss function consisted of mean absolute error (MAE) and structural similarity index (SSIM) between the output of each EDSR stage to its corresponding HR image, an adversarial loss (discriminator) and perceptual loss (VGG-16) with a respective weighting of 1, 0.3, 0.1, and 0.01 between the loss terms.
The discriminator and perceptual loss network were pretrained on a cohort of 50 CTCA patients (retrospective cohort of 50 patients) using 600,000 axial pairs of HR (0.6 mm 3 ) and downsampled LR (0.6 × 2.4 × 2.4 mm 3 ) with the proposed framework for 50 epochs by an ADAM optimizer (batch size = 16, learning rate 10 −4 ). In the training of the proposed CMRA-SR, the VGG-16 perceptual network was kept fixed, but the discriminator was trained adversarial.
The proposed SRGAN network is trained in a supervised manner on 460,000 axial pairs of HR-CMRA (0.9 mm 3 and F I G U R E 1 Proposed generative adversarial SR framework with cascaded EDSR generator (A), trainable discriminator (B), and perceptual loss network (C). Non-rigid motion-compensated 3D CMRA data ( Figure 2) is acquired to form the LR (0.9 × 3.6 × 3.6 mm 3 or 1. Testing was done on two cohorts: retrospective cohort and prospective cohort. For the retrospective cohort, 15 patients not seen in training (five-fold cross-validation with 3 unique left-out patients per run from retrospective cohort of 50 patients) with retrospectively downsampled LR images (0.9 × 3.6 × 3.6 mm 3 or 1.2 × 4.8 × 4.8 mm 3 ) from HR target (0.9 mm 3 or 1.2 mm 3 ) were used, that is, LR images and HR images come from the same acquisition. For the prospective cohort, 16 patients (not seen in training) were prospectively acquired with a fully sampled LR-CMRA of 1.2 × 4.8 × 4.8 mm 3 and with a separate undersampled HR-CMRA of 1.2 mm 3 isotropic resolution for comparison. If not stated otherwise, inference was performed on image level. For patch-level testing, patches are not overlapping.

| In vivo data acquisition
In vivo acquisitions were performed on a 1.5 T scanner (MAGNETOM Aera, Siemens Healthcare AG, Erlangen, Germany) with a dedicated 32-channel spine coil and an 18-channel body coil for CMRA and on a dual-source CT (Somatom Force, Siemens Healthcare, Forchheim, Germany) for CTCA. Subjects gave written informed consent before undergoing scans, and the study was approved by the National Research Ethics Committee. Subjects underwent ECG-triggered free-breathing whole-heart CMRA using the described acquisition approach. Relevant scan parameters included: 3D bSSFP sequence, field-ofview = 320 × 320 × 125 mm 3 , flip angle (FA) = 90 °, T 2preparation duration = 40 ms, SPIR fat saturation FA=130°, subject-dependent mid-diastolic trigger delay and acquisition window (range, 90-130 ms), specific absorption rate of ~0.9 W/kg.

| Prospective cohort
Sixteen patients (7 women, age: 49 ± 17 years) with suspected CAD underwent 3D CMRA with two acquisitions F I G U R E 2 Whole-heart 3D CMRA acquisition and reconstruction. A, ECG-triggered 3D whole-heart Cartesian bSSFP CMRA data are acquired under free-breathing in coronal orientation with a VD-CASPR undersampling. Each spiral-like arm was triggered to mid-diastole and preceded by T2-preparation, spectral-selective fat suppression pulses, and 2D image navigators, which allow a beat-to-beat 2D translational respiratory motion compensation. B, In the reconstruction, the acquired data are binned into five respiratory bins following the 2D iNAV to allow for 3D bin-to-bin non-rigid motion estimation. C, A 3D patch-based low-rank and motion-compensated reconstruction is performed to provide the 3D CMRA data used in the proposed SRGAN framework (1) HR-CMRA with 1.2 mm 3 isotropic resolution in ~7 min scan time (2.5× undersampling) and (2)

| Evaluation
The proposed SR-CMRA was compared against the HR-CMRA target and a bicubic interpolation (BI) in image space from the LR image. Images of the retrospective and prospective cohort (LR-, HR-, BI-, and SR-CMRA) were reformatted along the right coronary artery (RCA) and left anterior descending artery (LAD) to measure visible vessel length and vessel sharpness (for the first 4 cm and full length of the coronary arteries) using Soap-Bubble. 69,70 Vessel sharpness and length were assessed separately for the RCA and LAD from a user-defined coronary path. In the retrospective cohort normalized mean absolute error (NMAE), normalized mean squared error (NMSE), and SSIM between LR-CMRA/BI-CMRA/SR-CMRA and HR-CMRA were calculated. Statistical tests were performed using a paired Welch's t test with Bonferroni correction (significance level α = 0.05) under the null hypothesis of equal means for unequal variances.

| Ablation study
Two aspects of the proposed SRGAN were investigated in an ablation study: generator architecture and use of pre-trained discriminator. In a first comparison, the cascade of EDSR networks of the proposed SRGAN is replaced by a cascade of UNets (four encoder/decoder stages with two 3 × 3 convolutional filters each, dyadic kernel increase per stage starting from 8, stride of 2 for last convolution in stage for encoder and transposed convolution in decoder) with similar number of trainable parameters. In a second comparison, adversarial training without a pre-trained CTCA (ie, randomly initialized weights) of the SRGAN (with cascaded EDSR generator) is performed.

| Generalization study
The proposed SRGAN (jointly trained on patches and images to target 0.9 mm 3 and 1.2 mm 3 resolutions) is investigated for its generalization to the different imaging resolutions and input sizes (image or patch). To investigate generalization, SRGAN was additionally trained and tested individually for (1) each imaging resolution (from 0.9 × 3.6 × 3.6 mm 3 to 0.9 mm 3 and 1.2 × 4.8 × 4.8 mm 3 to 1.2 mm 3 ) with the proposed joint image/patch-level processing and (2) independently for image-or patch-level input in a target resolution of 1.2 mm 3 . In the following, we will abbreviate training/testing scenarios for super-resolving images from LR input of 0.9 × 3.6 × 3.6 mm 3 to HR target 0.9 mm 3 by the target resolution, that is, 0.9 mm 3 . Hence, in total four additional cross-validation trainings (two for imaging resolution and two for patch or image input) were conducted, with two testing scenarios for each.

F I G U R E 3
In addition, the proposed SRGAN (jointly trained on patches and images to target 0.9 mm 3

| RESULTS
Two patients with suspected CAD of the retrospective testing cohort are depicted in Figure 3 and Supporting Information Figure S1, which is available online. The LR input of resolution 1.2 × 4.8 × 4.8 mm 3 , HR acquisition, BI image, and SR-CMRA output for a target resolution of 1.2 mm 3 is shown in coronal orientation, as well as a magnified image of the RCA and a coronary reformat. Qualitative improvement in vessel sharpness and delineation can be appreciated for both RCA and left coronary artery (LCA). LAD can be recovered in SR-CMRA images with comparable delineation to HR acquisitions. RCA and LAD delineation are impaired in LR input and cannot be retrieved by BI. Quantitative scores of NMAE, NMSE, and SSIM indicate statistically significant (P < .001) improvement. Difference images to the HR target for the retrospective patients of Figure 3 and Supporting Information Figure S1 are shown in Supporting Information Figure S2. Lowest residual error can be observed for SR-CMRA in comparison to LR and BI.
Coronal and reformatted coronary images of the LR acquisition, HR acquisition, BI upsampling, and proposed SR-CMRA reconstruction in two patients with suspected CAD of the prospective testing cohort are shown in Figures 4 and 5. Similar qualitative improvements were achieved as in the retrospective cohort. RCA and LAD in Figure 4 were barely visible in LR and BI upsampling with reduced delineation. The proposed SR-CMRA approach recovered the coronary arteries and showed similar qualitative results as the separate HR acquisition. Upsampling-related staircasing for BI as observed in Figure 5 was not encountered for SR-CMRA reconstruction. In Figure 4, RCA and LCA showed improved vessel delineation and sharpness in SR-CMRA comparable to HR acquisition; however, SR-CMRA was acquired eight times faster in a ~50 s scan.
Axial, coronal, and sagittal views of the LR acquisition, HR acquisition, BI upsampling, and proposed SR-CMRA in a patient of the prospective cohort that demonstrated a mild (25-49%) coronary stenosis in the LAD and a moderate (50-69%) stenosis in the RCA are shown in Figure 6. Only 1 of the 16 prospective cases had confirmed coronary stenosis. The proposed SR-CMRA reconstruction can recover the stenosis which is not detectable in the BI upsampling. The SR-CMRA image shows similar perceivable quality than the HR acquisition.
The proposed framework generalized for different input resolutions and input sizes to perform the 16-fold upsampling as depicted in Figure 7. The SRGAN network was trained and tested for 0.9 × 3.6 × 3.6 mm 3 and 1.2 × 4.8 × 4.8 mm 3 input and was also able to achieve sub-millimeter resolution (0.9 mm 3 ). No marked quality degradations related to the different imaging resolutions and input size (patch/image) were observed between SR-CMRA reconstruction and HR target for the proposed training scenario. In Supporting Information Figure S3, the impact of training the network on one imaging resolution (target resolution of 0.9 mm 3 or 1.2 mm 3 ) and testing for the same and other imaging resolution is shown. Training on the lower target resolution (1.2 mm 3 ) slightly impairs the recovery of finer details and edges in the higher target resolution (0.9 mm 3 ). On the other hand, training on the higher target resolution (0.9 mm 3 ) slightly sharpened the overall image, but not statistically significant according to NMAE (P = .78), if tested with the lower target resolution images (1.2 mm 3 ), as also depicted in the difference images. Training and testing in the same resolution showed overall good performance which was in accordance with the proposed joint training on 0.9 mm 3 and 1.2 mm 3 target resolution. Obtained NMAE/NMSE/SSIM scores for these cases differed on average by 0.08 ± 0.04/0.002 ± 0.002/0.06 ± 0.03. None of these differences was statistically significant (P = .32/P = .82/P = .63).
A stronger influence was observed for training/testing on a patch/image level as shown in Figure 8. Overall good qualitative agreement to HR target was obtained in all cases. SR-CMRA images obtained from a patch-wise processing tend to be smoother than an image-wise processing, which might be attributed to the sliding window during testing in patch-wise processing. An increased noise floor in training on patches and testing on images, as also depicted in the difference images, was noted with a statistically significant difference in NMAE, NMSE, and SSIM (P < .001) in comparison to testing on patches. However, the other way round, training on images and testing on patches, showed only a marginal increased noise floor, which was also only statistically significant different in NMAE (P < .001) in comparison to testing on images (NMSE: P = .16, SSIM: P = .07). Training and testing on the same respective input showed a statistically significant difference in NMSE (P < .001) and NMAE (P < .01) relating to the observed smoother visual impression for patch-level processing (SSIM: P = .23). Recovery of details was equally well preserved in both scenarios. Comparing the individual training cases to the proposed joint patch/image-level training, statistically significant differences were observed in NMAE, NMSE, and SSIM (P < .01), which were due to the aforementioned results, that is, smoother images for patch-wise and slightly elevated noise in image-wise processing. No relevant performance change was observed when patch/image selection contributed to 40%-60%/60%-40% samples of the training database. Training database compositions in F I G U R E 7 Impact of different imaging resolutions and patch-/image-based SR reconstruction: Coronal LR input, BI, HR acquisition, and SR reconstruction in a patient with suspected CAD of the retrospective cohort. The patient was scanned twice with a resolution of 0.9 mm 3 and 1.2 mm 3 . Image resolutions are stated below. Patch-and image-wise SR is compared for a network trained jointly on both patch/image input. The proposed network generalizes for different resolutions and image-/patch-level processing the range of approximately 25%-30%/75%-70% shifted the performance toward the individual training cases of the respective majority (patch-wise or image-wise). In total, the joint patch/image-level training improved on average in terms of NMAE/NMSE/SSIM over the individual training/testing cases by 80% ± 13%/64% ± 18%/38% ± 21% indicating that the proposed network benefitted from learning the SR both on a local (patches) and on a global (images) scale, as also qualitatively illustrated in the difference images with respect to the individual training/test cases in Figure 8B. Furthermore, when the proposed jointly trained network was tested with imagewise or patch-wise inputs, no statistically significant difference in NMAE, NMSE, and SSIM (P < .01) was observed between these inputs, as also qualitatively shown in Figure 7.
The ablation study for changing generator networks is depicted in Supporting Information Figure S4. The UNet cascade tends to smooth images more than the EDSR cascade and loses high-frequency edge information. This is also reflected in a reduced quantitative performance for NMAE/ NMSE/SSIM by 30%/98%/14%, which was, however, not statistically significant (NMAE: P = .09, SSIM: P = .27) besides NMSE (P < .001). Pre-training of the discriminator with CTCA helped to stabilize training and improved loss convergence, resulting in statistically significant (P < .001) improved performance of NMAE/NMSE/SSIM by 34%/107%/26% over purely random initialized discriminator (ie, without pre-training), as shown in Supporting Information Figure S5. The difference image indicates a general reduced contrast in images of a randomly initialized discriminator.
Testing the proposed SRGAN with varying downsampling ratios showed good performance as illustrated for two patients of the retrospective cohort in Supporting Information Figure S6. The local (patch-wise) and global (image-wise) training for the 4 × 4 downsampling was able to provide consistent performance during inference of different downsampling factors, even beyond 4 × 4 downsampling but with decreasing performance. This was also reflected in the quantitative analysis as shown in Supporting Information Figure S7.
A quantitative evaluation in the 15 test patients of the retrospective testing cohort is given in Figure 9 in terms of NMAE, NMSE, and SSIM between LR-, BI-, and SR-CMRA to HR-CMRA for both imaging resolutions (0.9 mm 3 and 1.2 mm 3 ) and joint patch/image-level training. Statistically significant improvement (P < .001) of SR-CMRA over LR and BI-CMRA was obtained in all scores. Measured vessel sharpness and F I G U R E 8 A, Impact of training and testing only on patch-or image-level input. The SR reconstructed images are shown in comparison to the HR target for a resolution of 1.2 mm 3 in a patient of the retrospective cohort. Difference images (20-times scaled) along rows and columns show the respective variations due to the training/testing scenarios. B, Difference of individual training/testing scenarios (20-times scaled) to proposed joint training (patches and images) of SRGAN. Difference images to proposed SRGAN are depicted length in coronary reformatted images indicate high agreement between SR-and HR-CMRA with no statistically significant difference. Nearly no difference was observed for vessel sharpness in the first 4 cm of both LAD and RCA. Slightly reduced vessel sharpness in SR-CMRA compared to HR-CMRA was obtained for the full length, but this was not statistically significant. Overall, statistically significant improvement (P < .001) of SR-CMRA over LR-and BI-CMRA was obtained for vessel sharpness and length in both LAD and RCA.
Similar conclusions can be drawn in the 16 patients of the prospective cohort as shown in Figure 10. Vessel sharpness and length was equally well recovered for LAD and RCA in SR-CMRA in comparison to HR-CMRA and showed statistically significant improvement (P < .001) over LR-and BI-CMRA. Measured sharpness was in general better in RCA than in LAD. Distribution and relation of data samples for vessel sharpness and length in LAD and RCA of the retrospective and prospective cohort are detailed in Supporting Information Figure S8.
The proposed network required ~186 h for training, but inference took only ~3 s to super-resolve the images from the LR input. A slower convergence (on average ~2× longer) and increased validation loss (on average by +8%) was found if trained from scratch (without pre-training the discriminator on CTCA).

| DISCUSSION
In this work, we propose an SR framework for free-breathing motion-corrected 3D CMRA which allows a 16-fold upsampling from an LR input enabling acquisition of 1.2 mm 3 3D CMRA in ~50 s and reconstruction in ~3 s (SRGAN only).
The performance, feasibility, and generalization of the proposed framework were assessed in two cohorts of retrospective downsampled HR images in 15 patients with suspected CAD and of prospectively acquired LR images in 16 patients with suspected CAD. The retrospective cohort enabled quantitative analysis of the SR in relation to the HR target for varying imaging resolutions (0.9 mm 3 and 1.2 mm 3 ) and for patch/imagelevel processing. The feasibility to shorten the acquisition time to less than 1 min was examined on prospectively acquired fully sampled 3D LR-CMRA. The proposed SR-CMRA showed high agreement to HR images acquired in ~7 min. No noticeable reduction for SR-CMRA in comparison to HR-CMRA in F I G U R E 9 Quantitative evaluation in LAD and RCA of LR CMRA (0.9 × 3.6 × 3.6 mm 3 and 1.2 × 4.8 × 4.8 mm 3 ), BI (0.9 mm 3 and 1.2 mm 3 ), proposed SR reconstruction (0.9 mm 3 and 1.2 mm 3 ), and HR acquisition (0.9 mm 3 and 1.2 mm 3 ) in 15 patients of the retrospective cohort. NMAE, NMSE, and SSIM between images and HR target are calculated. Vessel sharpness and length are calculated in coronary reformats for LAD and RCA. Average and SDs over whole test cohort are depicted. Statistical differences (P < .05) are indicated by asterisks quantitative measurements as well as in perceivable quality was encountered. Previous works with CS, 17-21 low-rank [23][24][25] or deep-learning-based undersampled reconstruction 44 achieved acquisition accelerations of up to ~9× (depending on undersampling trajectory) resulting in a scan time of ~5-10 min for resolutions ~0.9 -1.2 mm 3 . For the same retrospective CMRA cohort, previous works of a low-rank 23 and deep-learning-based 44 reconstruction achieved a pseudorandom undersampling of up to 5× and 9×, respectively. The proposed LR acquisition in this work achieved an effective 16× acceleration (fully sampled LR imaging) that translates into a scan time of ~50 s. It is conceivable to further accelerate the proposed acquisition or to modify the sampling by combining LR and pseudorandom undersampling. This trajectory would, however, require the combination or integration of SRGAN with a reconstruction technique (CS, low-rank, plug-and-play prior, or unrolled networks) and will be investigated in future studies.
Previous SR deep-learning works focused on retrospective analysis for a 4 × 4 fold downsampled CMRA in an imagebased generative adversarial network (GAN) 57 indicating the feasibility of SR for breath-held CMRA. In the work of Steeden et al, 60 a 2 × 2-fold upsampling was investigated to provide a four times faster diaphragmatic-navigated CMRA acquisition in about ~3 min, although without 100% respiratory scan efficiency. Chaudhari et al 71 investigated the use of 3D convolutional neural networks operating on patches for an up to eight-fold slice upsampling for musculoskeletal MRI. Chen et al 51,72 proposed an efficient 3D generative network operating on patches for a 2 × 2-fold residual upsampling by multi-level dense connections which was investigated for brain MRI. The proposed SRGAN used a cascade of EDSR generators for a 4 × 4-fold upsampling, which showed superior performance in relation to other state-of-the-art SR networks for single image SR, 73,74 while at the same time being computationally and memory efficient. Single image deep-learning SR does not require acquisition of multiple slices/volumes in orthogonal directions and upscaling is trained in a supervised manner (on LR and HR pairs) into the network. In contrast to previous works, SRGAN learns the residual upsampling on a local (patch) and global (image) scale, as well as from different imaging resolutions. SRGAN operates on a large database of motion-corrected 3D HR-CMRA which is challenging to acquire. The proposed cascaded generator architecture allows flexible adaption in future studies for higher or lower upsampling factors by adding or removing EDSR blocks from the cascade. A pre-training on CTCA helps to initialize the discriminator for the operation domain (coronary angiography).
The proposed framework generalized for different imaging resolutions to reconstruct target resolutions of 1.2 mm 3 and sub-millimeter 0.9 mm 3 . For prospective imaging, this study focused only on the acquisition of a non-accelerated LR-CMRA with 1.2 × 4.8 × 4.8 mm 3 resolution to reconstruct an HR-CMRA with 1.2 mm 3 resolution. A sub-millimeter 3D CMRA can be of interest to assess more accurately CAD risk by potentially better resolving the distal segments of the coronary arteries. It is conceivable for the proposed framework to acquire a sub-millimeter 3D CMRA in ~1 min scan for a 2.5× undersampling. We preliminarilly explored the possibility of sub-millimeter 0.9 mm 3 SR-CMRA in the retrospective cohort; however, this will need to be further investigated in future studies. We conclude that the network was able to generalize for different imaging resolutions, benefitting from learning edge delineation in higher target resolutions and was not overemphasizing one imaging resolution. Furthermore, we investigated the flexibility of the proposed framework toward operating on an image or patchlevel and found good performance if training is performed jointly on both sets. In comparison to training on a single set (patch-wise or image-wise), the joint training benefits from both sets and the observed image degradations (smoothing, increased noise) were mitigated, suggesting that the network has learned a local and global SR upsampling. When tested then on patch-wise or image-wise input no statistically significant difference was observed. This property, hence, would enable in future studies the integration of the proposed SR framework into either an image-based or patch-based undersampling reconstruction setting. It would furthermore allow future investigation toward optimal input sizes.
From the ablation study, we found that generators based on convolutional neural networks are feasible to address the task of single image SR. Good performance was found for the cascaded EDSR generator with less remaining smoothing for the residual SR than the UNet cascade. Comparisons to other architecture developments is of interest for future studies. [75][76][77] Pretraining the discriminator on CTCA, as proposed here, provided domain-adaptive initialization with increased performance for the operation domain (coronary angiography). It balanced out the good performance of the generator during adversarial training better than a randomly initialized discriminator.
The proposed SRGAN operated on respiratory motioncorrected and cardiac-triggered input images and we did not observe any motion impact on the SR-CMRA images. If input images are strongly corrupted by noise, this may propagate through the network and impair the SR reconstruction quality. Hence, it shall be noted that good prior motion-corrected reconstruction can be crucial. However, we did only observe this effect for simulated additive noise and did not see it in any of the retrospective or prospective cases. We also did not experience any non-realistic inpainting artifacts by the proposed SRGAN architecture. Data consistency during inference could be improved if SRGAN is combined or integrated with the reconstruction, and this will be investigated in future studies.
We acknowledge further limitations of this study. The SRGAN framework only operates on the 2D phase-encoding directions as frequency-encoding direction is already sampled in HR. It is conceivable that sharing 3D spatial information can be beneficial over 2D processing which will, however, increase computational demand and complexity and requires further architecture investigation to allow for full-scale image processing. Future studies will also focus on the analysis of different generator architectures for 3D SR-CMRA. The selected architecture only allows for integer-wise downsampling. Future studies will investigate the possibility of architecture extension to arbitrary and varying image resolutions and inclusion during training. Only coil-combined magnitude images were processed, and future studies will concentrate on use of multi-coil and complexvalued data which would also enable better integration into existing undersampling reconstruction frameworks.

| CONCLUSIONS
In conclusion, a deep-learning SR framework was proposed for CMRA that enables the acquisition of motion-compensated 1.2 mm 3 3D CMRA within 50 s under free-breathing. The proposed approach was evaluated on a cohort of retrospectively downsampled and prospectively acquired LR images in patients with suspected cardiovascular disease, enabling a 16-fold increase in spatial resolution. The proposed SRGAN showed good generalization toward different imaging resolutions and patch or image input. SR-CMRA results showed high comparable qualitative and quantitative image quality to conventional HR CMRA (~7 min), thus promising to enable 3D CMRA acquisition in a less than a minute scan. Studies in a larger cohort of patients are now warranted.

SUPPORTING INFORMATION
Additional Supporting Information may be found online in the Supporting Information section.  Figure S1) in retrospective cohort FIGURE S3 Impact of training and testing independently on different imaging resolutions. The super-resolution (SR) reconstructed images are shown for SR from 0.9 × 3.6 × 3.6 mm 3 to 0.9 mm 3 and 1.2 × 4.8 × 4.8 mm 3 to 1.2 mm 3 with the proposed joint image/patch-level processing in a patient of the retrospective cohort. Low-resolution and high-resolution (HR) images of 0.9 mm 3 and 1.2 mm 3 resolution are shown in Figure 7. HR images are acquired in two separate scans and hence difference images (20-times scaled) are only inspected along rows showing the respective variations due to the training/testing scenarios FIGURE S4 Ablation study for changing generator networks. The proposed cascade of EDSR generators is replaced by a cascade of UNet generators with similar number of trainable parameters. Coronal images of a patient from the retrospective cohort are shown. Difference image (20-times scaled) between EDSR (proposed) and UNet generator is depicted. Quantitative evaluation in 15 patients of the retrospective cohort is performed. Normalized mean absolute error (NMAE), normalized mean squared error (NMSE) and structural similarity index (SSIM) between images and high-resolution target are calculated. Average and standard deviations over whole test cohort are depicted. Statistical differences (P < .05) are indicated by * FIGURE S5 Ablation study for pre-training discriminator. The discriminator was pre-trained on a cohort of CTCA patients (proposed) and subsequently used to train the proposed SRGAN, in comparison to not performing pre-training, that is, randomly initialized discriminator weights. Coronal images of a patient from the retrospective cohort are shown. Difference image (20-times scaled) between CTCA pretrained discriminator (proposed) and without (w/o) pretraining is depicted. Quantitative evaluation in 15 patients of the retrospective cohort is performed. Normalized mean absolute error (NMAE), normalized mean squared error (NMSE) and structural similarity index (SSIM) between images and high-resolution target are calculated. Average and standard deviations over whole test cohort are depicted. Statistical differences (P < .05) are indicated by * FIGURE S6 Testing of various downsampling factors (2 × 2, 3 × 3, 3 × 4, 4 × 4, 4 × 5, 5 × 5) on the proposed SRGAN (jointly trained on patches and images to target 0.9 mm 3 and 1.2 mm 3 resolutions). Two patients of the retrospective cohort are depicted. The corresponding imaging resolutions are stated near the images of patient A FIGURE S7 Quantitative analysis for testing of various downsampling factors (2 × 2, 3 × 3, 3 × 4, 4 × 4, 4 × 5, 5 × 5) on the proposed SRGAN (jointly trained on patches and images to target 0.9 mm 3 and 1.2 mm 3 resolutions). Normalized mean absolute error (NMAE), normalized mean squared error (NMSE) and structural similarity index (SSIM) between images and high-resolution target are calculated. Mean and standard deviation over the retrospective cohort is shown FIGURE S8 Violin plots of Figure 9 (retrospective cohort) and of Figure 10