End‐to‐end unsupervised cycle‐consistent fully convolutional network for 3D pelvic CT‐MR deformable registration

Abstract Objective To improve the efficiency of computed tomography (CT)‐magnetic resonance (MR) deformable image registration while ensuring the registration accuracy. Methods Two fully convolutional networks (FCNs) for generating spatial deformable grids were proposed using the Cycle‐Consistent method to ensure the deformed image consistency with the reference image data. In all, 74 pelvic cases consisting of both MR and CT images were studied, among which 64 cases were used as training data and 10 cases as the testing data. All training data were standardized and normalized, following simple image preparation to remove the redundant air. Dice coefficients and average surface distance (ASD) were calculated for regions of interest (ROI) of CT‐MR image pairs, before and after the registration. The performance of the proposed method (FCN with Cycle‐Consistent) was compared with that of Elastix software, MIM software, and FCN without cycle‐consistent. Results The results show that the proposed method achieved the best performance among the four registration methods tested in terms of registration accuracy and the method was more stable than others in general. In terms of average registration time, Elastix took 64 s, MIM software took 28 s, and the proposed method was found to be significantly faster, taking <0.1 s. Conclusion The proposed method not only ensures the accuracy of deformable image registration but also greatly reduces the time required for image registration and improves the efficiency of the registration process. In addition, compared with other deep learning methods, the proposed method is completely unsupervised and end‐to‐end.

in iterative calculation of metrics such as mutual information (MI). 5 Other methods such as intensity-based feature selection algorithms extract features that correspond well with respect to the intensity; however, they do not necessarily correspond well in regards to the anatomy. [6][7][8] Recently, many studies have demonstrated the feasibility of deep learning methods for image registration. Cao et al. 9 proposed an approach based on deep regression networks to predict the deformation field between a pair of image datasets. In other papers, 10,11 CNN was used to perform fast image registration of three-dimensional (3D) pulmonary computed tomography (CT) images by combining multiple random transformations to generate a large training set. Rohé et al. 12 proposed the SVF-Net architecture using segmented shapes. All the above registration methods need pre-registration data, contour data, or synthetic data to train neural networks.
However, it is difficult to obtain well-registered clinical medical images and synthetic images are quite different from the actual clinical situation.
To overcome the shortcomings of supervised registration methods, some researchers proposed unsupervised registration methods.
Shan et al. 13 built an end-to-end unsupervised learning system with fully convolutional neural networks in which image-to-image medical image registration is performed. Hering et al. 14 presented an unsupervised deep-learning-based method in 3D thoracic CT registration using the edge-based normalized gradient fields distance measure (NGF). Low-dimensional vectors instead of image pairs were used as input to generate spatial transformation fields in Ref. [15]. Bob et al. 16 used the deformable image registration network (DIRNet) to register images by directly optimizing a similarity metric between the fixed and the moving image. Balakrishnan et al. 17 developed a novel registration method that learns a parametrized registration function from a collection of volumes using CNN. Although unsupervised registration methods do not require pre-registered data and thus have an advantage over supervised registration methods, most unsupervised methods ignore the inherent inverse-consistent property of transformations between a pair of images. 18 Generative Adversarial Network (GAN) is a deep learning method, which can make the generated data to have the same distribution as the real data. 19 To overcome the difficulty of acquiring image pairs in some applications, Zhu and Isola 20 25 proposed a cycle-consistent CNN to register multiphase liver CT images, but their method was also not suitable for CT-MR registration because the loss functions they used could not evaluate the similarity between CT and MR images.
In this paper, we propose a model of using the Cycle-Consistent method from CycleGAN for 3D CT-MR deformable registration. This model is end-to-end and does not require the ground truth deformations. Our contributions include the following: (a) Using Cycle-Consistent method in MR-CT registration to make the deformed image consistent with the reference image, (b) comparing the registration results with and without Cycle-Consistent, and (c) complete end-toend unsupervised 3D MR-CT registration network.

2.A. | Deformable image registration framework
The proposed model in this study is Cycle-Consistent FCNs which is divided into two deformation networks: the G CT-MR and G MR-CT .

2.B. | Patient data preprocessing
In all, 74 pelvic cases including CT images and MR images are used as datasets. We standardize all image data to make the distribution range of pixel values of all images consistent, and resample them to a resolution of 1 1 5 mm 3 . To reduce the size of input data and highlight the regions to be registered, each image is cropped to 400 400 voxels so that the redundant air areas are removed. Due to the limitation of compute video memory size, it is necessary to resample the training data to 200 × 200 24 voxels before the training process. Rigid registration is carried out for all the cases using 3Dslicer software 26 because it can reduce the difficulty of the neural network training. Finally, we normalize the image data and map all the image pixel values to the range of (−1, 1).

2.C. | Deformation network and Loss function
The objective of the deformation network is to obtain the spatial deformed transformation according to the input image pairs. We use spatial deformation field to describe the process of deformation registration. The structure of the deformation network is shown in   The most common metric for multimodal image registration is MI.
However, MI metric ignores the spatial neighborhood of a particular voxel within one image and consequently, which causes the decrease in registration accuracy in deformable registration. 28 To solve this problem, we use a metric called modality-independent neighborhood descriptor (MIND) to perform deformation registration on CT-MR images. 28 The MIND feature extracts distinctive image structure by comparing each patch with all its neighbors in a non-local region. 29 Formula (1) shows the MIND feature extraction function, where n is a constant to normalize the function R indicate the spatial search region.
D represents the L2 distance between two image blocks in image I centered on voxel x and voxel x + r, respectively. The detailed function of D is shown in Formula (2), where P denotes image Patch and we set the patch size to 5 × 5 during model training.
V is a variance estimate on voxel x and its function is shown in Formula (3), where N is the 3 neighborhood of the voxel x.
We can calculate the content loss between CT image and MR image based on MIND feature extraction function. As shown in Formula (4), where N represents the number of image voxels, R is the spatial search region and we set the region size to 7 × 7 during model training.

2.C.2. | Regularization loss
To prevent unreasonable deformation, we add regularization loss to make the deformation grid smoother. L2 regularization is used to evaluate the deformation field and its function is shown in Formula (5).
L reg ¼k D f jj 2 where D f denotes the deformation grid.

2.C.3. | Cycle loss
Cycle loss enables the deformed image to be deformed back to the original image. In addition, cycle loss can prevent some excessive deformation and make the model easier to converge. Formula (6) shows the function of Cycle loss, where G means the network generating deformation field, I m is the moving image, and I f is the fixed image. G(I m , I f ) denotes that I m is deformed to be similar to I f and G (I f , I m ) was the opposite.

2.C.4. | Total loss for model
The total loss L G of the model we proposed is composed of all the above loss and we set coefficients for different loss as shown in Formulas (7), (8), and (9). L MR-CT represents that MR image is the moving image and L CT-MR means CT image is the moving image. λ 1 , λ 2 , and λ 3 are constants to adjust the proportion of different loss in the total loss. We set λ 1 to 5, λ 2 to 1, λ 3 to 1 during model training.

| RESULTS AND DISCUSSION
Python and Pytorch was used to implement our model, and Adam was used as the optimizer. We compared the results of the proposed method with those of the registration software Elastix, MIM software, and FCN without Cycle-Consistent in terms of registration accuracy and registration speed. 21,22,30 The registration parameters in Elastix: interpolator is "BSplineInterpolator," Optimizer is "Adap-tiveStochasticGradientDescent," Transform is "BSplineTransform," Metric is "AdvancedMattesMutualInformation," and MaximumNum-    To explore the influence of the parameters of the metric MIND on MR-CT registration, we changed the patch size, the region size, and the neighborhood size of a voxel in Formulas (2)

CONFLI CT OF INTEREST
There are no conflict of interest.