Network slimming for compressed-sensing cardiac cine MRI

The Unet is attempted to be slimmed for compressed-sensing cardiac cine magnetic resonance imaging. Despite the excellent performance of the U-net, its heavy structure and long training time restrict its ap- plications in an environment with limited computational resources. We slimmed the U-net by changing the multiple convolutions to single con- volution and the transpose convolution to upsampling and by adopting fewer layers, without performance degradation. The number of train- able weights and training time of the slimmed network was reduced by 87.9% and 48.1%, respectively. The proposed network showed im- proved performance with a 1.45% reduction in the normalised mean square error.

The Unet is attempted to be slimmed for compressed-sensing cardiac cine magnetic resonance imaging. Despite the excellent performance of the U-net, its heavy structure and long training time restrict its applications in an environment with limited computational resources. We slimmed the U-net by changing the multiple convolutions to single convolution and the transpose convolution to upsampling and by adopting fewer layers, without performance degradation. The number of trainable weights and training time of the slimmed network was reduced by 87.9% and 48.1%, respectively. The proposed network showed improved performance with a 1.45% reduction in the normalised mean square error.
Introduction: Neural networks (NNs) have been successfully applied to various applications in medical imaging [1]. The U-net [2] is a convolutional NN that was originally developed for biomedical image segmentation. It won the IEEE international symposium on biomedical imaging (ISBI) Cell Tracking Challenge in 2015. Several applications have been developed using the U-net, for example, segmentation [3] and reconstruction [4]. Despite the excellent performance of the U-net, its heavy structure often restricts its applications in an environment with limited computational resources. In this study, the U-net is attempted to be slimmed for compressed-sensing (CS) cardiac cine magnetic resonance imaging (MRI) [5], without performance degradation. The training time and the number of trainable parameters are evaluated as the figures of merit for slimming.
Data: Cardiac cine data of eight healthy volunteers were measured using the balanced steady-state free precession (SSFP) sequence in a 3.0T Siemens system with the repetition and echo times of 3.88 and 1.94 ms, respectively. A five-channel phased array coil with a sensitivity encoding (SENSE) factor of 2 was used. The k-space data without undersampling were reconstructed to prepare the 'ground truth (GT) image' using inverse Fourier transform and SENSE reconstruction. Undersampled data were generated by undersampling the channel data along the phase encoding (PE) axis with the acceleration factors (AFs) of 2, 3, 4, and 8. The missing data were first interpolated from the data at adjacent cardiac frames and reconstructed to prepare the 'initial reconstructed (IR) image'. The measured data for each volunteer had 12 slices and 16-24 cardiac frames. The reconstructed image has a size of 256 × 256 in the transverse plane. The difference image between the GT and IR is normalised and shifted to prepare a label, and the IR is normalised to be input into the NN for training. Figure 1 depicts the schematic for training the NN.
The data from volunteers one to four (total 1104 images) and from volunteers five to eight (total 912 images) were used as the training and test datasets, respectively; thus, the training and test datasets were disjoint and originated from different volunteers. These datasets were undersampled with the AFs of 2, 3, 4, and 8. Thus, the training and test datasets consist of 4416 and 3648 images, respectively. All the studies were approved by the institutional review board. Written informed consent was obtained from all the volunteers.
Network slimming: The U-net consists of a multi-layered 'U'-shaped hierarchical structure. Each layer has several convolution operations with the rectified linear unit (ReLU) function. In the contraction path (left side), the spatial resolution decreases, whereas the number of feature channels increases with the layer depth. Max pooling was used to re-  duce the spatial resolution. In the expansion path (right side), the spatial resolution increases, and the number of channels decreases. Transpose convolution was used to increase spatial resolution. The output in the contraction path was also concatenated to the output of the transpose convolution. Owing to the multi-layer architecture with convolutions and transpose convolution, the U-net requires a large number of trainable parameters; for example, approximately 40 million parameters are required for an input of size 256 × 256.
The U-net is attempted to be slimmed in three ways. First, multiple convolutions (3 × 3) are reduced to a single convolution with a larger kernel size (5 × 5). Second, the transpose convolution is replaced by upsampling with a fixed kernel. Bilinear interpolation is used for the upsampling. Finally, the number of layers in the network is reduced from five to four. As CS cardiac cine MRI aims to reduce aliasing error from undersampling, a network with fewer layers may perform well. We combined these three components to construct a slim network (we call it the 'slim-net'). The structure of the proposed network is shown in Figure 2.
Result: The slimming efficacy was evaluated using the number of trainable parameters and training time. The evaluation was performed with a computer equipped with an Intel® Xeon® CPU E5-2620 v4 @ 2.1 GHz with a memory of 128GB, and an NVIDIA TITAN RTX GPU system Table 1. Normalised mean square error (NMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) of the reconstructed images using the U-net and slim-net as a function of the acceleration factor (AF). The scale of NMSE is 10 -3  Figure 3 shows the ratios of the number of trainable parameters and the training time of the slim-net and three other networks tuned to each slimming component, with respect to those of the U-net. The relative normalised mean square error (NMSE) of the reconstructed images is also shown for the test dataset. The left axis shows the scale for the number of trainable parameters and training time and the right axis for the NMSE. As shown in Figure 3, the number of trainable parameters of the slim-net is 12% that of the U-net, and its training time is 52% that of the U-net. Although the other networks were slim, they were not as slim as the slim-net. Note that the training time is not proportional to the number of trainable parameters, as some processing is independent of the trainable parameters. Furthermore, note that the decremental number of trainable parameters for the slim components are not fully additive, as the components are not completely separable. For example, the layer reduction involves convolution and upsampling.
The ratio of NMSE of the slim-net is 0.986, which indicates that this network slightly improves the reconstructed image quality. The ratios of NMSE of the other slim networks are also less than one, except for the one with the layer reduction. Thus, the slim-net does not degrade the image quality (it even slightly improves the quality), in spite of its simplified structure with reduced trainable parameters and reduced training time. Table 1 summarises the NMSE, peak signal-to-noise ratio, and structural similarity of the reconstructed images for the test dataset using the U-net and slim-net as a function of the AF. The rightmost column is the average value of each metric for all the AFs. The performance of the slim-net is better than that of the U-net for all the metrics. In particular, the slim-net shows a lower average NMSE by 1.45%.
Discussion: The use of cardiovascular cine MRI to scan the fourdimensional data of a moving heart in a few breath-holds remains a challenging task for clinical applications. The hardware approach with high power and high slew-rate gradient system often induces nerve stimulation. CS is widely used as an alternative approach. The real-time reconstruction of CS cine MRI with diagnostic image quality is another challenging task. The use of the U-net for reconstructing CS cardiac cine MRI has been investigated [6]. The main difficulties in using the U-net are its heavy structure and long training time.
We focused on three components to make the U-net slim: Multiple convolutions, transpose convolution, and deep layers. As multiple convolutions are redundant for some applications, we reduced them to a single convolution. Moreover, we adopted a larger kernel size (5 × 5) to compensate for the reduced range by single convolution. Although the transpose convolution has trainable kernels, it may not be very useful from the viewpoint of CS reconstruction. The transpose convolution mixes data over channels, unlike max pooling, whose operation is confined to a 2 × 2 array within the channel. Thus, the transpose convolution may also be replaced by upsampling with bilinear interpolation. As shown in Figure 3, the upsampling results in a greater reduction in the NMSE than the transpose convolution. The number of layers can be reduced to simplify the network. This may be dependent on the application of the NN. The range affected by the convolution may be expressed as K · 2 L − 1 , where K denotes the kernel size and L is the number of layers of the network. The range of the slim-net with K = 5 and L = 4 is 40, similar to that of the U-net with K = 3 and L = 5.
Although we tested the slim-net for CS cardiac cine MRI application, the network may be generally applicable to other applications in which computational resources are limited.

Conclusion:
We proposed the slim-net for CS cardiac cine MRI. Through quantitative analysis, the proposed network successfully realised slimming with one-eighth of the trainable parameters and approximately half of the training time of the original U-net with a slightly improved performance. The slim-net is useful for applications executed on devices or platforms with limited computational resources.