Method for registration of 3D shapes without overlap for known 3D priors

In 3D registration of point clouds, the goal is to ﬁnd an optimal trans- formation that aligns the input shapes, provided that they have some overlap. Existing methods suffer from performance degradation when the overlapping ratio between the neighbouring point clouds is small. So far, there is no existing method that can be adopted for aligning shapes with no overlap. In this letter, to the best of knowledge, the ﬁrst method for the registration of 3D shapes without overlap, assum- ing that the shapes correspond to partial views of a known semi-rigid 3D prior is presented. The method is validated and compared to exist- ing methods on FAUST, which is a known dataset used for human body reconstruction. Experimental results show that this approach can effec- tively align shapes without overlap. Compared to existing state-of-the-art methods, this approach avoids iterative optimization and is robust to outliers and inherent inaccuracies induced by an initial rough alignment of the shapes.

• We propose a novel deep learning method for 3D registration; to the best of our knowledge, this is the first method for registration of 3D shapes without overlap. • We present a novel learned correspondence representation.
• We validate the effectiveness of our approach by applying it to the challenging task of 3D human body reconstruction.
Related work: Methods for 3D shape registration can be classified into two main categories: (1) pairwise registration (e.g. [2]), and (2) multiview global registration (e.g. [7]) techniques. ICP and its variants are originally designed for pairwise registration, which takes two point clouds or meshes as input. Traditional pairwise registration needs two steps: (1) the rough alignment, which provides the initial estimate of the relative transformation, and (2) to iterative refinement of the alignment by minimizing the 3D registration error. Pairwise registration does not perform well when operating on multi-view 3D scans due to the incremental pairwise registration errors. To address this problem, multi-view registration methods are proposed in order to refine the global registration by incorporating cues from multiple views [8,9]. Recently, several researchers have proposed deep learning approaches for the task of 3D registration. [3] proposed deep closest point (DCP) that directly operates on point clouds. DCP consists of a point cloud feature extractor, an attention-based module and a differentiable singular value decomposition (SVD) layer to predict the rigid transformation. However, DCP followed by ICP is required to refine the registration results. PointNetLK [10] proposed the registration of point clouds by minimizing the distance between the learned global embedding resulting from PointNet [11]. PRNet [4] aligned point clouds with partial overlap based on DCP. These methods aim to jointly learn the feature vector and registration, which results in lack of generalization. DeepICP [12] learned correspondences between point clouds and then use ICP to apply SVD for registration. Similarly, 3DSmoothNet [13] use a Siamese deep learning architecture to establish correspondences between point clouds. [5] proposed an end-to-end solution for multiview 3D point cloud registration. However, these methods fail when there is no overlap between the inputs. In stark contrast to these methods, we aim at aligning point clouds or meshes without overlap, when knowing that they originate from an (semi-rigid) object with a given 3D shape. An example is to have the frontal and back view of the object. The two views do not overlap but they do originate from the same 3D object. Existing registration methods will fail due to lack of correct correspondences. To this end, we proposed a novel learned correspondence representation for inputs without overlap.
Problem statement: We use X and Y to denote the source and target point clouds respectively, where X = {x i ∈ R 3 , i = 1, . . . , N} and Y = {y j ∈ R 3 , j = 1, . . . , M}. The proposed method works for both N = M and N = M. This is a rigid registration problem, whereby we assume that X is transformed by an unknown rigid motion and then aligned with Y . We denote the rigid transformation as [R xy , t xy ] where R xy ∈ SO(3) and t xy ∈ R 3 . Most of registration algorithms aim at minimizing the meansquared error E(R xy , t xy ), given by: where cor() is an operation to find correspondences in Y for each point from X . Solving this optimization and finding correspondences are performed alternatively in an iterative manner. This formulation absolutely fails for our task due to lacking overlap or correspondences.
In contrast to existing formulation for registration, we propose to learn a novel correspondence representation In this study, we train a deep neural network to learn φ, which takes the partial point cloud as input and outputs the virtual correspondence. Then a perfect registration is obtained based on the correspondence representation. This can be written as: ones() creates a row vector filled with ones. Hence, the rigid transformation can be easily obtained using normal equation: No assumption is made about the overlap, rough alignment or the input order of points for the partial point clouds. The mini solver is used for computing the affine transformation between the two outputs produced by the network. Finally, the estimated transformation is applied to align the partial point clouds

Fig 2 Comparison with different registration methods based on FAUST data
Our method is an one-shot solution, and no iteration is needed.
Proposed correspondence representation: Traditional ICP and its variants utilize closest neighbours as correspondences. However, they are prone to the initialization and overlap ratio. PointNetLK [10] learned an embedding correspondence representation and implemented registration by minimizing the distance between the learned correspondences. This method, however, is prone to misalignment and not designed for nonoverlap shape registration. In contrast to these methods, we propose to learn 3D correspondences for the task of shape registration without overlap. Given partial point clouds of the same subject, we train a neural network, dubbed 3D Correspondence Network (3D-CN), to output a fixed number (denoted by U ) of ordered 3D points that enables establishing the correspondence between the input partial point clouds and the complete geometric shape from which they originate. U is a hyperparameter in our model. This idea is inspired by point cloud completion [14]. In contrast to [14], the output produced by 3D-CN has the following properties: (1) it represents the same complete geometric shape (a complete body shape in the example of Figure 1), (2) it has the same number U of points for any input partial point cloud and (3) it has the same order of points, irrespective of the input. Based on these properties, one-to-one correspondences between partial point clouds of the same object are naturally built. Essentially, the outputs produced by 3D-CN are identical up to some affine transformation which can be found using a simple solver (see Figure 1). The complete object is a 3D prior that serves as proxy to enable the alignment between the partial point clouds. In the following ablation study, we perform experiments to analyse the effects of U on the registration.
Proposed architecture: Figure 1 shows the overview of the proposed method. Given two partial point clouds, a shared 3D Correspondence Network (3D-CN) is used to predict virtual correspondences for the source and target point clouds respectively. In this letter, we validate our algorithm by applying it to solve a challenging task: the human body

Fig 3 Architecture of the proposed 3D Correspondence Network (3D-CN). Given a partial point cloud, we aim at predicting the virtual correspondence represented by the structured 3D point sets. U denotes the number of predicted correspondence points, which is can be controlled by the user
shape reconstruction from two non-overlapping scans. As 3D-CN takes one partial point cloud as input, our method can be directly used for multi-view point cloud registration. Next, the transformation parameters are directly obtained by a mini solver. As shown in Figure 3, 3D-CN mainly consists of two modules: a feature extractor from point clouds and a correspondence predictor. The feature extractor takes the point cloud X as input and extracts a k-dimensional feature vector f where k = 1024. Then, the correspondence predictor consumes the feature vector and output U structured 3D points. Our feature extractor is a simplified version of PointNet [11]. Its first layer takes m input points as input. A shared multi-layer perceptron (MLP) consisting of three linear layers with ReLU activation is designed to map each point to a point feature vector. Then, a point-wise max-pooling operation is used to obtain a global k-dimensional feature vector. Our correspondence predictor is designed by a MLP consisting of three linear layers with ReLU activation. The loss function L is defined by the mean squared error between the predicted correspondence φ(X ) and the ground truth correspondence φ(X ) GT .
Training dataset: To demonstrate the effectiveness of our proposed approach, we train our model using a human body dataset. We generate 1 × 10 5 synthetic human body shapes based on SMPL by sampling parameters from the SURREAL dataset [15], which is also used to train networks for other tasks [16]. The open-source Blender Sensor Simulation plugin Blensor [17] is used to render partial point clouds without overlap from the front-facing and back-facing views. We also set the parameter noise_sigma=0.02 for adding noise to the point clouds when performing the experiments.

Experiments:
We evaluate our results on the FAUST dataset by comparing the rotation and translation errors. We define the following metrics for quantitative comparison, which are also used in [6]. Given the estimated R and the ground truth R GT , the rotation error is defined as: For the translation error, we use the following: We compare our method against point-to-point ICP [2], point-toplane ICP [18], deep global registration (DGR) [19] and 3D multiview registration (3DMR) [5]. The former two methods are popular ICP-based methods and the latter two are state-of-the-art deep learning-based registration approaches. Figure 2 provides results when given a source point cloud (in blue) and a target point cloud (in red) that share no overlap between them; results of different approaches are visually compared. It can be seen that ICP-based methods fail to perform the registration when the input two shapes are not well roughly aligned (Figure 2 middle row). Existing methods cannot obtain a correct registration when operating on inputs without overlap. However, experimental results show that our method can work well for the inputs without overlap even though the input data has noise and bad initial alignment. The quantitative comparisons of rotation and translation errors are reported in Table 1 and Table 2 respectively. The results show that our method significantly improves the registration accuracy compared to the existing state-of-the-art methods.   Ablation study: Based on the 400 testing data which is not included in the training, in this section we try to explore the effects on the registration of sparse and dense correspondences. We also compare our correspondences with the complete point clouds (CPC) from point completion networks [14]. Due to lacking the one-to-one correspondences for complete point clouds, point-to-point ICP is applied to compute the transformation. In this experiment, we manually down-sample 4, 10, 100, 1000 points from the SMPL body as the sparse correspondences. For a fair comparison, the number of output points from point completion networks is also set to 6890. As shown in Table 3 and Table 4, it can be observed that our method can significantly improve the registration accuracy compared to the point completion network. In addition, dense correspondences prove to yield more robust results compared to sparse correspondences.

Conclusions:
To the best of our knowledge, this is the first method for 3D shape registration without overlap. We validate the effectiveness of our approach by applying it to the challenging task of 3D human body reconstruction from two partial non-overlapping scans. The results based on the FAUST dataset show that our approach yields robust results in non-overlap 3D registration. Our method proves also to be robust to noise and poor initial alignment, and it is not iterative. Comparisons with traditional registration methods based on iterative optimization as well as against recent deep learning registration approaches show that our method obtains state-of-the-art results by significantly improving the registration accuracy.