Image Registration in Medical Robotics and Intelligent Systems: Fundamentals and Applications

Medical image registration, by transforming two or more sets of imaging data into one coordinate system, plays a central role in medical robotics and intelligent systems from diagnostics and surgical planning to real‐time guidance and postprocedural assessment. Recent advances in medical image registration have made a significant impact in orthopedic, neurological, cardiovascular, and oncological applications.The recent literature in medical image registration is reviewed, providing a discussion of their fundamentals and applications. Within each section, the registration techniques are introduced, classifying each method based on their working mechanisms, and discussing their benefits and limitations are discussed. Recently, machine learning has had an important impact on the field of image registration, yielding novel methods and unprecedented speed. The validation of registration methods, however, remains a challenge due to the lack of reliable ground truth. Medical image registration will continue to make significant impacts in the area of advanced medical imaging, as the fusion/combination of multimodal images and advanced visualization technology become more widespread.


Introduction
Medical robotics and automated image guidance have been useful technologies for changing modern surgical and interventional procedures since the mid-1980s. [1,2] The successful use of advanced intelligent systems in the medical field is attributed to the advantages of combining the strengths of humans and computer-aided technology in an information-intensive setting. [3] As described in other literature, medical robots and intelligent systems are considered as information-driven tools that assist operators/surgeons with improved efficacy, increased safety, minimized invasiveness, and reduced morbidity. [4] Figure 1 illustrates the overall structure of the medical robotic system, which includes the analysis of medical information, surgical modeling/ planning, real-time action, and postprocedure evaluation. Among all the enabling modules, the analysis and registration of medical images play a central role in enabling humans and robots with additional information that is otherwise unavailable or inaccurate from manual examination of a singleimage modality.
As an integral part of medical robotics and intelligent systems, image registration can be used throughout the entire process of clinical practice, beginning from diagnostics and procedural planning to real-time guidance and postprocedural evaluation of surgical or therapeutic outcomes. For example, neurologists can conduct a diagnostic assessment by overlaying a patient's images on a representative image of a disease, which can be obtained from an atlas of neural diseases. [5] Alternatively, registration can be used intraprocedurally by overlaying the surgical paths or target locations (that were determined prior to the procedure) with real-time images that visualize the position and orientation of surgical or transcatheter tools. [6] Furthermore, registration can be used to compare pre-and postprocedural images to allow direct and quantitative assessment of the intervention. [7] In general, medical image registration deals with the transformation of multiple imaging datasets into a single coordinate system, either in 2D or 3D. In this review, to simplify the discussion, we focus on the problem of registering two images: the source image (I S , also known as the moving image) and target image (I T , also known as the fixed image). The two images are related by a transform matrix M. The problem of registration is modeled to determine the most suitable transform matrix by optimizing an energy function ε ¼ F ðI T , I S · MÞ þ RðMÞ (1) where the first term F quantifies the alignment of transformed source image (I S Â M) and the target image I T . The first term F is also referred to as the matching criterion, similarity/ dissimilarity metrics, or the distance between two measured targets. The second term R regularizes transformation with certain deformation models, aiming to meet any specific properties in the solution that the user/application requires. Theoretically, an image registration algorithm consists of three building blocks: 1) feature detection: 2) image transformation; and 3) quantification of alignment and optimization. The overall workflow for image registration is shown in Figure 2.
By combining various choices for three building blocks, one can create a wide variety of image registration solutions. Traditionally, the registration methods are classified based on nine basic criteria that are dimensionality, nature of registration basis, nature of transformation, domain of transformation, interaction, optimization procedure, modalities involved, subject, and object. [8,9] Each of the criteria has multiple hierarchical classifications, further dividing these methods into subcategories. Although this classification method can provide a well-organized library for researchers to search for appropriate solutions, it may also present complications because a single registration technique can be classified into multiple groups. We, therefore, discuss the registration techniques in Section 2 and 3 from two perspectives: the fundamentals of registration and transformation and optimization. Within each section, we introduce fundamentals of the methods, classify various techniques based on their working mechanisms, and discuss their benefits and limitations.
In recent years, machine learning (ML) has attracted extensive interest in medical image registration and achieved noticeable progress for many applications. [10,11] Therefore, we also include a subsection to discuss the use of ML-based methods. Following the discussion in Section 2 and 3, we review the clinical applications, aiming to provide a useful reference for biomedical engineers, clinical researchers, or physicians developing or using registration technologies. Finally, we present perspectives on new challenges and future directions in medical image registration. As the goal of this review is to highlight recent advances, a majority of the publications highlighted in this review are from within the last decade; however, readers interested in more expansive discussions may refer to several prior reviews. [12][13][14]

Fundamentals of Registration Methods
This section explains the fundamental differences among a variety of medical image registration methods based on the involved image modalities (Section 2.1), dimensionality (Section 2.2), feature basis (Section 2.3), and processing levels (Section 2.4).

Monomodal Versus Multimodal Image Registration
In monomodal registration, images acquired from the same types of imaging sensors are registered together, whereas in multimodal imaging registration, the images to be registered are captured from different modalities. In multimodal registration, the images from different image modalities usually have very distinct representations. For example, soft tissues and blood vessels are relatively clear in ultrasound or nuclear medicine imaging, whereas bone is more apparent than tissue under radiography. Therefore, pixel-based registration methods may not be suitable for multimodal image registration as the pixel representations are distinct. An extensively used approach to multimodal image registration is based on optimizations of mutual information (MI) between images. [15,16] Compared with intensity-based methods, which are frequently used for monomodal registration, the   [15] In the study, two methods were introduced based on the calculation of patch entropy and manifold learning for multimodal registration of magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) brain images. Whereas the manifold learning method has a theoretical advantage of presenting optimal approximation, [17,18] the application of entropy has practical advantages in its reduced computational complexity. [19,20] 2.2. 2D Versus 3D Image Registration 2D imaging includes modalities such as projectional radiography, X-ray fluoroscopy, 2D ultrasound, and endoscopy imaging, whereas 3D imaging includes CT scans, MRI, 3D ultrasound, and nuclear medicine functional imaging. Based on the dimensionalities of images, the registration algorithms can be classified into three groups: 2D-2D, 3D-3D, and 2D-3D registration. The 2D-2D and 3D-3D registrations can be either monomodal or multimodal, whereas the 2D-3D registration is typically a multimodal registration. In 2D-2D registration, a moving image is transformed with operations such as scaling, rotation, and two orthogonal translations. In comparison, registration of 3D images typically includes scaling, 3D rotations, and three orthogonal translations and therefore has increased computational complexity. 3D registration often utilizes additional capturing parameters (e.g., scaling factors, C-arm angles) to reduce the number of unknown factors and minimize the searching regions for parameters to be optimized. [21] For example, Dibildox et al. recently reported a 3D-3D registration work to combine 3D computed tomography angiography (CTA) models to biplane reconstructions, aiming to improve image guidance during percutaneous coronary interventions of chronic total occlusions. In this study, they used precalibrated pixel sizes and the orientation of fluoroscopy images to determine the orientated Gaussian mixture model for achieving a rigid spatial alignment with a point-set registration approach. [22] In 2D-3D registration, a typical solution is to apply 2D-2D methods by slicing the 3D imaging dataset into multiple 2D images and registering each 2D image with the target. This solution is extensively used in the registration of radiography images and often referred to as the generation of digitally reconstructed radiographs (DRRs). To achieve this method of registration, initially, one needs to determine the orientation or pose of 2D images. [23] For example, to register the 2D fluoroscopic images with 3D CT data, Varnavas et al. used a generalized Hough transform for initial pose determination and registered fluoroscopy images with the DRRs achieved at the determined poses. [24]

Intrinsic Versus Extrinsic Registration Methods
The extraction of the common features in the two registering images is based on either natural anatomical characteristics or external fiduciary markers used in the target imaging relations. Accordingly, depending on the information basis, the registration methods can be generally classified as intrinsic or extrinsic registration methods.
Extrinsic registration relies on artificial fiduciary markers that are attached to the patient (either on the surface or inside the body of the patient). The fiduciary markers are designed to have arbitrary shapes, high contrast, and are clearly visible, making them easy to detect in all imaging modalities. [25] As a result, the extrinsic registration methods with artificial markers are comparatively easy, fast, and can often be automated. [26,27] The main shortcomings of extrinsic methods are the invasiveness of markers to the patients and the time and effort required to prepare and affix the marker [28] . Despite the need of noninvasive markers, the acceptance of these markers by patients remains an issue. Fiduciary markers are typically made of heavy metals (e.g., gold or platinum) to achieve a high contrast under radiography. [25] However, these markers may not be suitable for MRI registration, as metallic materials must be strictly avoided in MRI scanners. [29,30] Intrinsic registration relies only on the patients' natural body parts. Without the assistance of artificial markers, the registration methods may require more complicated algorithms by taking the unstructured anatomical features as the common fiduciary markers. One criterion for selecting these natural fiduciary markers is to ensure they are visible under both imaging modalities. For example, bone structures are often selected as common land markers as they are the most distinctive structures in many imaging modalities, including X-ray radiography, MRI, and ultrasound imaging. [31][32][33] 2.4. Intensity-, Gradient-, and Feature-Based Registration Methods The fundamental approach of image registration is to determine the locations of the paired points in 2D (i.e., represented as pixels) or 3D spaces (i.e., represented as voxels) and to use those locations for calculating or searching for the optimal transform matrix. To detect the anatomical feature or artificial fiduciary markers, medical images may undergo three levels of processing: intensity comparison, [34,35] gradient calculation, [36,37] and/ or feature extraction. [38] The methods using intensity comparison are also called pixel/voxel-based registration or area-based registration. [39,40] For intensity-based registration, the algorithm analyzes the pixel/voxel value in the entire image and does not rely on gradients or features calculated from the neighboring pixels/voxels. The intensity-based methods are preferably used for images that do not have many prominent details or distinctive features that can be well defined by local shapes or structures. [41,42] A common type of intensitybased registration method is template matching, which compares the gray scales or color values of each pixel between the source and target images. [43] Other pixel-or area-based methods include binarization, where an appropriately set threshold value for pixel intensities is used to segment the region of interest. [44] This method is suitable for extrinsic registration, [45] as the fiduciary marker has high contrast from the background. [46,47] As medical images captured from different modalities have very distinct differences in pixel intensity, pixel-based methods may fail to address multimodal image registration and therefore are preferably used for monomodal registration.
To address the registration of images where the intensities have significant differences, one solution is to use gradient-based registration methods to further process the images by calculating the derivatives from neighboring pixels. [36] Based on those calculations, the contour of the extrinsic fiduciary marker or intrinsic anatomical structures can be detected. The classical gradient calculation methods include Canny [48] and Sobel convolutions. [49] However, these basic convolutions using predefined kernels may not be effective for many applications where the gradients in the two registering images do not have large similarities. In those cases, normalized gradient fields may be applied to align the gradient direction of the two images, unaffected by their gradient magnitude differences. [50] Alternatively, higher levels of feature extraction (e.g., straight line, circle, intersection, or corner) are applied using other transformations (e.g., Hough transformation, [51] Harris operation, [52] and Laplacian of Gaussian transformation). [53] In most cases, especially for intrinsic deformable registrations, there is little structural information for building predefined features, and therefore automated feature extraction methods are needed. Examples of automated feature extraction algorithms include features from accelerated segment test (FAST), [54] binary robust independent elementary features (BRIEF), [54] scale-invariant feature transform (SIFT), [55] speeded up robust features (SURF), [56] binary robust invariant scalable key points (BRISK), [57] and maximally stable external regions (MSER). [58] The comparison of feature extraction algorithms has been examined in studies by Tareen and Saleem and Kashif et al. [59,60] In general, the SIFT method was reported to produce the best matching results and outperforms other generic methods for processing radiographic images. [61] However, recent research on the registration of conventional optical images also reported that a combination of FAST feature points and SURF descriptors could produce better registration results than the classic SIFT algorithm and significantly reduce the computational cost to achieve fast image matching. [62]

Transformation and Optimization in Medical Image Registration
As previously mentioned, image registration can be modeled as a mathematical problem to find a transformation M by optimizing an energy function. In general, the transformation methods can be classified into two groups: rigid and deformable transformations. The rigid transformations are often applied in medical applications where the imaged anatomical structures (e.g., bone and blood vessels) do not have large deformations. In contrast, the deformable transformations are typically developed to compensate the deformation of soft tissues or morphology changes.

Rigid Registration
In rigid transformations, the points following the transformation (x 0 ∈ I S ∘ M) are computed with linear conversions from the original points (x ∈ I S ). The linear conversions include similarity, affine, and projective transformations. In some classification methods, [9] the rigid transformation does not include the affine and projective transformation and is restricted to a narrow definition which includes only translation, rotation, and scaling. However, in this review, we use a broad definition to consider the affine and projective as subcategories of rigid transformation, as both affine and projective manipulations can also be used for registration of rigid anatomical structures. Theoretically, rigid transformation can be described by where R is a 3 Â 3 matrix representing linear transformation, and T is a 3 Â 1 vector representing translation. It can also be rewritten into a simplified version as x 0 ¼ Rx þ T. In similarity transformation, the linear matrix only consists of scale and rotation. Figure 3B shows an example image produced by similarity transformation.
Affine transformation is often used when the target images have a sheering effect. After affine transformation, the straight lines remain straight and parallelism is preserved, but the length and angles change (see Figure 3C). In a study by Jenkinson et al., a robust affine transformation method is proposed to minimize the calibration errors in the voxel dimensions to achieve a 3D-3D registration of brain images. [63] In recent years, researchers have improved upon this method by developing new optimization strategies for determination of the affine matrices. Ying et al. remodeled affine registration as an optimization problem by using the Iwasawa decomposition to introduce several reasonable constraints. [64] With a series of quadratic programming steps, the new algorithm achieved performance and efficiency superior to the previous methods reported in studies by Besl and McKay [65] and Zha et al. [66] Another group adopted a teaching-learningbased optimization method to solve the transformation matrix for global 3D affine registrations. [67] The affine registration can also be solved by treating the 2D points as complex numbers, such that each point has a polynomial with a set of complex coefficients that can be computed from the roots, which represent the points in the target dataset. [68] In a previous review, Shekhar et al. compared the registration performance of ultrasound volumes for different transformation complexities. [69] Projective transformation is used when the imaging plane appears tilted and the source and target images are captured at different imaging angles. After projective transformation, the straight lines remain straight, but the parallel lines converge toward a vanishing point (see Figure 3D). Projective transformation can be useful in 2D-3D registrations. For instance, researchers have applied projective transformations to the registration of noncalibrated 2D images with 3D Euclidean coordinates. [70] The proposed method is based on a linear matrix inequality (LMI) framework that can simultaneously calculate projective transformation and establish the 2D-3D correspondences without triangulating the image points.

Deformable Registration
Deformable registration (also known as curved registration) uses a nonlinear transformation matrix to convert the source images into a distorted shape, setting the feature points to the corresponding locations in the target images. In deformable registrations, transformation is modeled with an additional vector term, representing the local displacement field: The local displacement field is computed using various functions, such as a polynomial model, piecewise affine transform, [71] or a local weighted mean model. [72] Compared with rigid registration, deformable registration is often used for soft tissues to compensate for morphological changes caused by the movement of patients (e.g., respiration motion) or operational manipulations caused by doctors (e.g., pulling, cutting, or stitching in surgical procedures). Reviews of deformable registrations can be found in the literature, [73,74] as a thorough explanation is beyond the scope of this paper. Instead, we will discuss a few important deformable models that are commonly used.

Elastic Body Models
A classical deformable registration method is to model the image as an elastic body, which is described by the Navier-Cauchy partial differential equation where F is the force field that drives transformation according to the image matching criteria, μ and λ, where μ represents rigidity and is used to quantify the stiffness of the image and λ is Lamés first coefficient. This model was first proposed by C. Broit to consider an image grid as an elastic membrane that is deformed by the internal and external driving forces at an equilibrium state. [75] To find the equilibrium state, Davis et al. applied a spline www.advancedsciencenews.com www.advintellsyst.com interpolation method based on the correspondence of fiduciary markers that are given as a radial symmetric function. [76] Recently, researchers also applied the elastic body spline method with finite element models to register MRI with CT scans for guiding prostate cancer radiotherapy. [77,78] Fayad et al. implemented a nonrigid B-spline registration algorithm to derive the deformation matrices to compensate for respiratory motion, with the promise of providing a new PET-MRI solution for oncology applications. [79] 3.

Viscous Fluidic Flow Models
In this method, the deformable images are modeled as viscous fluids whose changes in velocity and shape are governed by the Navier-Stokes equations with very low Reynold's numbers. [80,81] Compared with the elastic body models, the fluidic models can deal with very large deformations but at low computational efficiencies. The fluidic flow models have been applied to the monomodal registration of MRI images for diagnosis and treatment of Huntington's disease [82,83] and Alzheimer's diseases. [84] 3.

Diffeomorphism Flow Models
The deformation in this model is calculated by including flow velocity over time, according to the Lagrange transport equation. [85] The regularization term in the energy function (i.e., Equation (1)) was established with constraint of the velocity field where kv t k V is the norm on space V of the velocity vector fields that can be computed by a differential operation. A few theoretical aspects to model diffeomorphism and a detailed discussion of computational analysis were presented in previous studies. [86][87][88] Recently, researchers tried to combine the diffeomorphic models with the Demons algorithm. [89,90] The Demons algorithm was first proposed by Thirion by considering nonparametric deformable registration as a diffusion process in a similar way that Maxwell did to solve the Gibbs paradox. [91] The Demons method uses the deformation forces inspired from the optical flow equations and alternates between computation of the forces and regularization by Gaussian smoothing. [90] 3.

Similarity Metrics and Optimization
To assess the suitability of a given transform (M), the alignment performance can be quantified by using the energy or cost function εðMÞ. Accordingly, the registration problem can be formulated as an optimization problem where the cost function εðMÞ is calculated by using similarity matrices or MI. In this section, we summarize the similarity matrices and optimization methods.

Similarity Metrics
The similarity metrics, which compares the differences between the transformed source image and target image, include the sum of squared differences (SSDs), sum of absolute differences (SADs), normalized cross-correlation (NCCor), and normalized correlation coefficients (NCCoe). Among these similarity metrics, correlation-based methods (NCCor or NCCoe) can compensate variation in the scale of pixel intensity values. The classical correlation-based methods compare the pixel intensities between the transformed image and target image, according to the following equation where I S·M ðx, yÞ and I T ðx, yÞ are the pixel intensities at location (x, y) in the transformed source image and target image, respectively, and I S·M and I T are the average pixel intensities for both images. The NCC method can compensate the variations in pixel intensities and is preferable for rigid registration of monomodal images, where translation and slight rotations are involved in the transform matrix. Figure 4 illustrates the registration of two ideal fluoroscopic images with various similarity matrices.

Mutual Information
Mutual information (MI)-based methods, originating from information theory, have become increasingly popular for the registration of many image modalities. MI is a measure of the statistical dependency between two image datasets and can be calculated by pðx, yÞ log pðx,yÞ pðxÞ · pðyÞ where p (x) and p (y) are the probability distributions of particular pixel intensities in individual images and p (x, y) is the joint probability distribution in the aligned image. [92] The MI-based methods have been proven effective in 3D-3D registration of images taken from multimodalities, such as CT and MR or magnetic resonance (MR) and PET. [93,94] The MI-based methods do not assume a linear relationship between the pixel values of the two images but instead they maximize the co-occurrence of the most probable pixel values in the two images. Therefore, the MI-based methods that utilize statistical distribution are used preferably for 2D-3D/ 3D-3D image registration. One drawback of MI-based methods is that the dependence of intensity values and neighboring voxels is not measured. To address this drawback, one can consider incorporating an additional constraint to involve the dependence of neighboring voxels (i.e., the spatial information of the images). [95] 3.

Optimization
The determination of the transform matrix is often modeled as an optimization problem to find the minimum of dissimilarity measures (i.e., penalty function) or the maximum of similarity www.advancedsciencenews.com www.advintellsyst.com measures (e.g., cross correlations, MI). [96] The basic solution to this optimal problem is an exhaustive search over the entire image domain to find the best transformation matrix.
Although the global search method has a high computational cost, it can be used when the transform matrix is simplified (e.g., only translations are required when rotation and scaling are known in rigid registrations). Another commonly used algorithm for the calculation of the transform matrix is the iterative closest point (ICP) algorithm that does not require all the pairwise correspondences of landmarks/features to be predefined and iterates toward the nearest minimum of the local error. [97,98] For registrations involving more degrees of freedom or additional regularization terms for deformable registration, the direct search strategy or simple iterative approaches do not work because computational complexity significantly increases. Instead, many applications apply more than one optimization technique, frequently comprising coarse and fine optimization steps. [99] Coarse optimization produces an initial estimate for the alignment of the two images, using simplified transform models with relatively low accuracy, whereas fine optimization searches more accurate results within the "search range" produced by coarse optimization. In practical terms, the initial alignment in coarse registration is often achieved by visual inspection and manual adjustment by physicians or other operators to ensure global optimum is inside the capture range. [100] When the initial alignment of the starting position is not available, due to the requirement of full automation or lack of effective algorithms for coarse optimization, global optimization and heuristic search are used to determine the optimal transform matrix. These global optimization algorithms include simulated annealing, [101,102] Monte Carlo random sampling, [103] unscented Kalman filtering, [104] pattern search algorithms, [105] and multistart search strategies. [106] One major type of optimization method is based on the calculation of the gradient of cost function. The representative method is the gradient descent approach, which optimizes the objective function along the direction that decreases the total energy cost. [44,107] Gradient-based methods are often used to maximize MI. [108] Other researchers also used conjugate gradient methods that apply prior knowledge from previous gradients to generate a new search direction, conjugated to the previous direction. [109] To reduce computational cost, researchers have used stochastic gradient approaches that use the approximation of the gradient. [109] A stochastic approach, proposed by Robbins and Monro (RM), estimates the gradient information with a step size that decreases with time to achieve high accuracy. [110] The RM method was reported to achieve best performance for gradient estimation when MI and cubic B-spline free-form deformations (FFDs) were used for registration of a subset of 3D imaging data. [111] The gradient-free optimization methods include Powell's method that has been widely applied in a low degree of freedom registration tasks. [112,113] Another nongradient optimization is the Gaussian-Newton method that calculates the inverse Hessian matrix from the previous iterations and uses it to determine the search direction for the next step. [114,115] The Gaussian-Newton method is typically used in combination with the Demons deformation model to solve for multimodal image registration (e.g., MR and CT). [116] Other local optimization methods, including the downhill simplex method, [117] best neighbor search method, [118] and hill climbing methods. [119,120] were reported to perform approximately similarly in radiotherapy applications. [121,122]

Machine Learning in Medical Image Registration
The optimization of the transform matrix has been traditionally formulated as an iterative pair-wise optimization problem wherein several parameters need to be optimized. This is a computationally expensive process, which takes a lot of time, ranging from tens of minutes to hours, to converge on a regular central processing unit (CPU). [123,124] Thus, two key issues can be identified in the current workflow: 1) selecting features that improve the overall accuracy of image registration and 2) the computing time for optimizing the transformation parameters. Traditionally, feature extraction has been accomplished using intensity-, gradient-, and/or feature-based methods. However, models developed through these approaches are often specific to a particular problem and time-consuming to design and validate. To address these shortcomings, ML methods have been developed to address the problem of selecting relevant features. [125] These methods can be categorized as filter methods, wrapper methods, and embedded methods. [126] However, using these methods, the task of selecting relevant features can only be accomplished after features have been manually defined or potential features have been extracted using rule-based methods (e.g., gradient). Furthermore, the task of determining the transformation paramenters using these features remains mainly dependent on using iterative pair-wise optimization.
Deep learning, a subfield of ML, offers an alternative approach. Deep learning models have shown promising results for performing automated feature extraction. This "learning" is accomplished using backpropagation, which is a feedback loop that adjusts the relative weighing parameters within a given model to select for optimal features. [127,128] The task of optimizing the transformation parameters with deep learning models can be recasted into a problem of function estimation, which is sometimes referred to as amortized optimization. The rationale of learning-based registration is to move away from expensive iterative optimization. Instead of solving the optimization for each input pair of images, the problem can be formulated to find a function that takes input images and directly computes an output transformation. To accomplish this, a learning strategy aimed at finding this function can be modeled using neural networks.
The early attempts in developing neural networks used a supervised learning strategy with many pairs of images, coupled with precomputed (ground truth) transformations. Rohè et al. proposed a fully convolutional neural network (CNN) architecture called stationary velocity fields network, which accomplished 3D image registration. [129] Once trained, the algorithm performed better than the local cross correlation Demons algorithm [130] on comparing predicted Dice coefficient and Hausdorff distance and only took 6 s on a CPU (x40 faster) and less than 30 min on a graphics processing unit (GPU) (x8000 faster). Similarly, Miao et al. used a CNN approach to accomplish real-time 2D/3D registration. [113] They showed that deep learning-based image registration was more robust and accurate than traditional intensity-based registration methods. This was evaluated by comparing success rate, capture range, and the running time for the CNN-based approach with Powell's method combined with gradient correlation. [131] Recently, Li et al. used a different class of convolutional networks called fully connected networks (FCNs) to achieve 3D/3D image registration. [132] The dataset comprised 1882 subjects obtained from Alzheimer's disease neuroimaging initiative and LPBA40 (a human brain atlas constructed by Laboratory of Neuroimaging). In another work, Cao et al. proposed a deep learning-based nonrigid intermodality registration framework to predict the transformation parameter for 3D/3D multimodal image registration. [133] Their approach was evaluated on data from 15 patients, each with both a CT and an MRI image, and took 15 s to achieve image registration.
Obtaining reliable and accurate ground truths are not only cumbersome and time-consuming, but it also limits the accuracy of the model as it only learns from a representative annotated dataset. Therefore, it is ideal to have an algorithm that can be trained to learn and predict the transformation parameters without the need of a priori ground truth information. To accomplish this, de Vos et al. proposed an unsupervised learning-based endto-end deep learning network for deformable image registration. [134] Their algorithm was tested on two separate datasets: 1) Modified National Institute of Standards and Technology handwritten digits and 2) cardiac cine MRI scans. Their algorithm obtained comparable or slightly better results than a conventional approach using Simple-Elastix. In another study, Balakrishnan et al. proposed a general learning CNN model architecture called VoxelMorph inspired from U-Net [135] to perform image registration with unsupervised loss. In this VoxelMorph framework, the unsupervised learning model achieved comparable accuracy to state-of-the-art conventional methods while operating orders of magnitude faster on GPU. As this is a general learning model, through the use of appropriate similarity metrics, this model could also be principally extended multimodel image registration. [136] Because of the elimination of the need to define features, the ability to directly provide output transformation, the learning-based methods are promising and represent a substantial change in medical image registration.

Clinical Applications
The development of image registration technologies has made significant contributions in many clinical applications. Among them the four major areas are orthopedic, neurologic, cardiovascular, and oncological medicine.

Orthopedic Applications
Traditional radiography is particularly useful at distinguishing calcium, soft tissue, fat, and air and therefore remains the initial choice of imaging in bone and joint disorders. Fluoroscopy, as a "real-time" modality, can be useful in observing motion of joints and in guiding needles or other surgical instrument insertions. [137] MRI demonstrates the ability to distinguish noncalcified body tissues, which allows for evaluation of bone marrow, joint spaces, and soft tissues. [138] Orthopedic ultrasound is generally used for detecting bone surfaces, fluid-filled spaces, and other soft tissues and can be used to guide needle or surgical tool placements. [139] Registration of two or more of these imaging modalities has been studied for both diagnostic and surgical applications.
In one recent study, registrations of high-resolution peripheral quantitative computed tomography (QCT) scans at baseline and 12 months post-kidney transplantation were used to assess endocortical bone loss in kidney transplant patients. Previously, the clinical standard for assessment of bone loss (dual-energy X-ray absorptiometry) could not distinguish between the cortical and trabecular bone. [140] Registration of QCT images in specific bone regions over time can be helpful in assessing disease states, as well as to predict future fractures. A method for automated, accurate registration was proposed for assessing and monitoring mineral density using bone atlases and patients' QCT data. [141] Both studies utilize rigid alignment in their registration algorithm, which was sufficient for the registration of patient scans alone. Rigid alignment was also utilized in a study of metastatic bone lesion monitoring using skeleton PET/CT scans. [142] In this case, articulated registration outperformed both deformable and solely rigid registration in monitoring bone lesions throughout cancer therapy.
Registration technologies have also been used in orthopedic surgical procedures, both preoperatively and intraoperatively. When preoperative scans are registered to postoperative scans, the data can be used to determine the success of the operation or monitor disease processes. Additionally, preoperative scans registered to intraoperative imaging can assist the surgeon with proper navigation and placement of surgical devices. This is especially true in tumor resections, wherein MR is superior in determining intraosseous and extraosseous tumor extensions and CT is superior in delineating bony details [143] (see Figure 5A). Additionally, Docquier et al. were able to transfer these target plane coordinates to an allograft in a subsequent CT-CT registration. [144] In this study, rigid transformation was obtained by coarse principal components analysis (PCA) followed by ICP rigid registration. It appears that rigid registration produces acceptable results for surgical planning. [145,146] Registering intraoperative ultrasound images to preoperative CT data has also been studied. Intraoperative ultrasound guidance is relatively widespread and has demonstrated accuracy when paired with fiducial markers and electromagnetic tracking. [147][148][149] However, the implantation of these markers is invasive, so there is a need to explore intrinsic registration methods to combine the use of ultrasound imaging and CT data. Among the many proposed algorithms, the ICP algorithm for rigid registration was the most widespread. [149][150][151] Further modifications to rigid registration algorithm have also been developed to account for soft tissue deformation caused by the ultrasound probe on the surface of the body. [152] However, it remains that rigid registration is widely acceptable in clinical orthopedic applications, whereas additional steps accounting for local deformation may be needed for some special cases.

Neurological Applications
Neuroimaging generally falls into two categories, structural and functional. Whereas CT was the earliest technique for structural imaging of the brain, it has been gradually replaced with MRI due to its ability to avoid radiations and distinguish the differences in tissue types. Among the functional imaging methods, PET, single-photon emission computed tomography (SPECT), and functional magnetic resonance imaging (fMRI) are the most commonly used. [153][154][155] Many advancements have been made in recent years regarding image registration for both diagnostic and surgical applications. Diagnostic image registration has been utilized for monitoring certain neurological diseases, such as Alzheimer's [156] and brain tumors. [157] Most registration techniques involve a combination of rigid/affine and nonrigid registration in a coarse-fine registration manner, with subsequent additions for pathologic or abnormal images. Nonrigid registrations are more accurate than rigid Figure 5. Medical image registration has been used for various applications. A) 3D-3D registration of the partial view of ultrasound to the CT model is used for ultrasound-guided computer-assisted orthopedic surgery. Adapted with permission. [143] Copyright 2012, Springer Nature. B) A superimposed ultrasound image on axial MRI slice was achieved by rigid and deformable registration to show the registration result for image-guided neurosurgery. The fiducial registration error is improved by 0.11 mm from rigid to deformable registration. Adapted with permission. [162] Copyright 2014, Elsevier. C) A deformable registration method is used to bring the 3D centerline model of the coronary arteries into biplane fluoroscopic angiograms to provide guidance during chronic total occlusion procedures. Adapted with permission. [169] Copyright 2012, IEEE. D) An MI-based method is used to register MRI with planning CT for image-guided radiotherapy in head and neck cancer. Adapted with permission. [177] Copyright 2017, Elsevier.
www.advancedsciencenews.com www.advintellsyst.com for neuroimaging, but there is a risk for misregistration due to the complexity of deformable models. [157] Parallelization of image registration on the GPU and CPU have accelerated the Gaussian pyramid computation and cost function calculation, respectively. These advancements have been successful for diagnostic classification of Alzheimer's disease. [156] In addition, the inclusion of tissue classification algorithms has allowed for more robust detection of hydrocephalus, which is indicative of various neurological pathologies. [158] A recent trend in image-guided neurosurgery is the use of registration to combine preoperative MR with intraoperative ultrasound data. The recent evaluation of advanced neuroimaging tools has reported a standard workflow for neuroimaging registration, which first compared the registration of whole-head and brain-only (deskulled) images. [159] In this study, a linear affine transformation is first used for initial global alignment, then followed by a deformable transformation with more degrees of freedom. [160] To avoid the time-consuming manual segmentation of the brain in MR images prior to registration, one successful technique was mapping MR intensities to resemble those of ultrasound before registration to patient ultrasound data. A rigid body registration to intraoperative 3D ultrasound was then performed with pseudo-ultrasound. This was found to be more successful than the normalized mutual information (NMI) method. [161] Compared with commonly used similarity measures that utilize the calculation of cross-correlation functions, a similarity measure that captures the correlation of a multichannel and scalar image (linear correlation of linear combination, LC2) was demonstrated to be robust in aligning 2D and 3D ultrasound to preoperative MRI, which was validated in neurosurgical data [162] (see Figure 5B). This LC2 similarity measure was then used to create an algorithm that applies a deformation model after rigid registration, which enables the automatic registration of preoperative MRI to intraoperative 3D ultrasound. [107] Image registration has also been validated for its use in guiding and planning intracranial electrode interventions. In neurological diseases, such as epilepsy, the localization of intracranial electrodes post-surgery is necessary to localize epileptic areas of the brain that may need to be excised. In practice, intracranial electroencephalogram (iEEG) has been widely used, but it is difficult to register iEEG across patients or to other imaging modalities. The registration of high-resolution preoperative MR images with postoperative CT scans has been validated for functional mapping of brain activity and has been successful in monitoring and identification of seizure foci. [163] In addition to postoperative monitoring of electrodes, rigid registration of atlas (target) MR images to patient CT scans has proved accurate and more precise than manual methods for determining target electrode locations for deep brain stimulation (DBS) interventions. [164] In practice, DBS position planning consists of rigidly registering patient MRI to atlas images. Then, stereotactic imaging (MR or CT) is rigidly registered with the preoperative targeting MRI.

Cardiovascular Applications
Image registration in cardiology is an evolving topic as many imaging modalities can provide unique information and there is not one prevailing imaging modality. Therefore, image registration research involves many different modalities for preoperative planning, intraoperative guidance, and diagnostic/monitoring applications. Nonrigid registration has been shown to suppress motioninduced artifacts when aligning template estimation images to the reference images. [165,166] The deformable registration has been successful in MR images as well as PET scans. Nonrigid registration has also been successful in enabling automatic segmentation and the inference of left ventricular volume and mass, for the diagnosis of several cardiovascular diseases, such as hypertrophic cardiomyopathy, arrhythmogenic right ventricular dysplasia, and ischemic heart disease. [167] Cardiovascular surgery and interventions can be very complex due to the proximity of important anatomical structures. One example of preprocedural planning involves the registration of optical coherence tomography (OCT) to determine stent size in vivo and X-ray angiography to determine stent positioning. [168] Nonrigid registration of preoperative CT angiography images to intraoperative fluoroscopic angiograms has reduced the uncertainty of traditional 2D intraoperative guidance [169] (see Figure 5C). Similarly, feature-based and subsequent MI-based registration have been proposed for preoperative CT and intraoperative ultrasound images, and GPU implementation has accelerated this registration. [170] These innovations allow for active intraoperative utility of image registration. Due to the abundance of soft tissues in thoracic structures, nonrigid methods are preferred in cardiovascular image registration. Accordingly, tissue tracking in cardiac interventions is necessary in some cases, and an accurate method for this has been tested preclinically. [171] Interestingly, rigid registration has been utilized as well in cardiac interventions, wherein real-time X-ray fluoroscopy is aligned to intraoperative 3D transesophageal echocardiography. This method has also shown clinically relevant accuracy and short runtimes to be used intraoperatively. [172]

Oncological Applications
Imaging in oncology becomes increasingly important for patient management, including diagnosis and follow-up. High-resolution anatomic imaging modalities like CT and MRI provide information on lesion morphology and structural changes in adjacent tissues. However, tumor physiology/function is not clearly determined. Functional imaging, such as PET and SPECT, provides more insights into the biological functions of tumors and their interactions with surrounding structures. However, because of the relatively low spatial resolution and inability to provide anatomic detail, PET and SPECT are frequently registered with CT and MR to provide robust oncological assessment. [173] In cancer treatment, deformable image registration has been explored to automatically define regions of interest in adaptive radiotherapy. Initially, the registration results were validated and deemed acceptable with the involvement of physician reviews, as the system performance can vary significantly due to distinct clinical presentations of the tumor. [174] To reduce human involvement, in a later study, a new framework for quantitative validation of deformable registration algorithms was developed to determine the best evaluation metric for multiple types of clinical deformation. [175] Deformable registration has also been introduced to allow for individualized radiotherapy www.advancedsciencenews.com www.advintellsyst.com and to overcome the differences in patient positioning among different imaging modalities (see Figure 5D). [176][177][178] Fully automated methods for deformable registration of PET/MR and PET/CT data were recently validated in the liver [176] and breast [179] to guide radiotherapies. In addition, deformable registration has allowed for more detailed monitoring of tumor contours during therapy, making the targeted therapies more accurate by reducing the toxicities. [180][181][182] This was validated in head and neck cancers for CT-CT registration [180] and subsequently proven clinically acceptable in cervical brachytherapy for MR-MR registration, [182] demonstrating utility in a variety of anatomical areas.
In tumor resections, image guidance can be used to obtain more accurate tumor margins and allow for more of the surrounding tissue to be spared, thus making the surgery less deleterious. Tissue deformation prediction [183] and continuously updated intraoperative imaging information [184] have been included in registration workflows as possible solutions to account for intraoperative tissue deformations. In the laparoscopic resection of malignant liver lesions, preoperative CT data has been registered to intraoperative ultrasound using a rigid transformation with the ICP method. This registration method increased the accuracy of the resections and spared more of the liver parenchyma. [183] Augmented reality was also integrated with image registration to combine preoperative MRI with intraoperative fluoroscopy and cone-beam CT, which allows for realtime guidance for the placement of surgical instruments and visualization of the resected margins. [184]

Laparoscopic Applications
Laparoscopic surgery is preferable to open surgery, due to minimal invasiveness, reduced complications, and shortened hospital stay. [185] However, laparoscopic applications require a high level of understanding of medical images to identify key anatomy from the laparoscopic camera or monitors. Furthermore, the insufflation used for creating space for laparoscopic tools causes large deformation of the organs and abdominal wall, mitigating the efficacy of surgical planning that is based on preoperative images. These limitations in laparoscopic applications call for the registration of the patient's anatomic models with real-time laparoscopic images. [185] In contrast to orthopedic or neuronal applications, the imaged targets in laparoscopic applications dynamically change their size, shape, and locations. These special challenges could be addressed by applying image registrations together with biomechanical models. [186] For instance, Bano et al. demonstrated the use of an ICP (ICP) rigid registration to initially set the models in the same coordinate system for laparoscopic liver surgery. The initial registration results were then refined by applying a finite element model for soft tissue deformation. [187] The biomechanical mode was constructed with Young's modulus and Poisson's ratio, which represent elasticity and compressibility of the liver tissues, respectively. In another study for laparoscopic liver surgery, Oktay et al. first calculated the deformations and shifts in the organ position caused by gas insufflation using the biomechanical model and then finalized registration with a diffeomorphic registration method, which has a high degree of freedom. [188] Although the registration of preoperative models and intraoperative images could spare the surgeons from cognitive workload, it remains questionable in terms of perception and interpretation errors, especially when large misalignments exist due to the relocation and deformation of soft tissues. To assist surgeons for assessment of the in vivo registration errors, Thompson et al. described a novel method by using projected errors of surface features to provide a reliable predictor of subsurface target registration errors for liver resection applications. [189]

Summary and Outlook
Image registration is now commonly accepted in clinical care with improved outcomes in many surgical procedures, especially for positioning patients in radiotherapy and orthopedic surgery.
Recently, registration of multimodal images revealed a trend of shifting from extrinsic to intrinsic registration, because intrinsic methods do not require additional fiduciary markers to be introduced to the skin or body of the patients. Although there is an increasing number of studies in deformable or nonlinear registrations, global rigid registration is still the most frequently used registration approach in clinical procedures. The intensitybased methods, relying only on pixel values without the need for detection of special landmarks, have also entered the mainstreams of registration research in multimodal applications.
For developing ML-based registrations, there is an emerging need for public databases of representative, expert-annotated images, especially for analyzing the intraoperative cardiovascular images that are typically low resolution and low contrast. The registration of images in cardiology is currently conducted on a caseby-case basis. In comparison, other applications such as image registration for neuronal surgeries and radiotherapies have started to build and use the public atlas of representative images, which significantly accelerate the research in these areas. [190,191] Validation of methods and results, in particular for nonlinear registrations, remains to be a major challenge in applications of image-guided procedures. [192] One major reason is that many image-guided applications do not have well-defined standards and the validation of coregistration algorithms and results mainly depends on the individual judgment of the involved clinicians or physicians. Another reason is the lack of quantification protocols and well-established methods for measuring the local registration errors with absolute certainty. The existing evaluation of registration-used computer simulations or physical phantoms is of very limited value, because of the infeasibility of including the overwhelmingly rich variety of anatomic and pathologic features. Many studies focus their algorithms on specific regions of interest, so it is difficult to extend these registration algorithms to other anatomical areas. Future research efforts should focus on overcoming these limitations and providing more universal solutions. To solve the validation challenge, future research may investigate the use of additional assessment systems to measure the aligned features or key points. These additional assessment systems can be either a third-image modality that can clearly reveal the ground truth or an electromagnetic tracking system when electromagnetic sensors are used extrinsically.
Nonrigid registration provides a more accurate model of human anatomy than rigid registration. This is especially true www.advancedsciencenews.com www.advintellsyst.com for areas of soft tissue where intraoperative deformation may occur. Thus, suitable evaluation metrics should be standardized for nonrigid registration algorithms. Available studies have shown difficulty in standardizing reliable validation protocols. Current methods either use real patient image data or phantom image data as a reference for deformable registration. Phantom data can be useful, but the difficulty in creating phantom data that represent all anatomical deformations is very resource heavy and not practical in clinical environments. In addition, phantom data display uniform intensities in areas that patient data would display gradients. So far, the suitability of specific evaluation metrics for deformable registrations is reliant upon the specific clinical situation. [175] It appears that investment in a large scale and a long-term study that includes a large number of varying clinical cases will be necessary to standardize these metrics. Despite the significant progress in developing new algorithms, registration and visualization are still quite separate topics. [193] At present, not many registration approaches are integrated with state-of-the-art visualization (e.g., 3D rendering of anatomical structures and visualization on augmented reality or virtual reality (AR/VR) devices). Meanwhile, not many visualization approaches are combined with image registration to utilize the registered results. [194] In future, the combination of medical registration and new visualization technology is anticipated to provide an integrated solution for enhanced visualization in the medical field.
Functional imaging modalities provide valuable clinical information that structural imaging cannot distinguish. Although it has been studied, image registration would benefit from additional research focused on registering functional and structural modalities. To achieve this, software standards should be implemented so that researchers can collaborate and expand upon existing innovations. There are many open-source software options currently available for medical image registration. For example, Insight segmentation and registration ToolKit (ITK) is an open-source crossplatform system that provides registration functions in two, three, and more dimensions. Based on ITK, Klein et al. further developed elastix software, which incorporates several optimization methods, multiresolution schemes, interpolators, transformation models, and cost functions. [195] Similarly, the advanced normalization tools (ANTs) package is also based on ITK and commonly used in brain image analysis for managing, interpreting, and registering multimodal and multidimensional image data. NiftyReg, which is developed by Translational Imaging Group at University College London, provides various methods for rigid, affine, and nonlinear registration, as well as GPU-and CPU-based implementation. Additionally, Graphical Interface for Medical Image Analysis and Simulation (GIMIAS) provides functionalities for manual and automatic segmentation, visualization, and mesh editing. An extended list of commercially available software and opensource toolboxes can be found in the literature [196,197] and on the Neuroimaging Informatics Tools and Resources Collaboratory webpage. [198] 6. Conclusions Image registration has remained an important topic of research interest, especially as advances have been made in clinical applications. Each part of the image registration workflow has shown improvement, including registration algorithms, computing capabilities, and even the imaging modalities themselves. Whereas many applications of image registration have become standard practice in certain clinical areas, many new developments are still in the early stages and are yet to become the standard of care. Minimally invasive procedures and more objective clinical measures are becoming more popular in developing new methods. Medical image registration will witness an increasing use in the area of advanced medical imaging, as the fusion/ combination of multimodal images and advanced visualization technology (AR/VR) becomes more widespread. The capabilities of image registration have expanded significantly, and patient care will be greatly improved as more of these technologies are adopted into standard clinical procedures.