A curvature and density-based generative representation of shapes

This paper introduces a generative model for 3D surfaces based on a representation of shapes with mean curvature and metric, which are invariant under rigid transformation. Hence, compared with existing 3D machine learning frameworks, our model substantially reduces the influence of translation and rotation. In addition, the local structure of shapes will be more precisely captured, since the curvature is explicitly encoded in our model. Specifically, every surface is first conformally mapped to a canonical domain, such as a unit disk or a unit sphere. Then, it is represented by two functions: the mean curvature half-density and the vertex density, over this canonical domain. Assuming that input shapes follow a certain distribution in a latent space, we use the variational autoencoder to learn the latent space representation. After the learning, we can generate variations of shapes by randomly sampling the distribution in the latent space. Surfaces with triangular meshes can be reconstructed from the generated data by applying isotropic remeshing and spin transformation, which is given by Dirac equation. We demonstrate the effectiveness of our model on datasets of man-made and biological shapes and compare the results with other methods.


Introduction
While the convolutional neural network has achieved significant success in 2D image processing, more and more attention has re-cently been drawn to applying the technique to the domain of 3D shapes. Unlike 2D images, which are typically represented by a multidimensional tensor, the representation of 3D shapes is usually unstructured, hence the convolutional neural network is not di- In this paper, we propose a 3D deep generative model based on mean curvature and metric, which in discrete case are expressed by two functions that are invariant under Euclidean motion. It has the following advantages against the existing models: Firstly, our model preserves more detailed structure in case that the curvature plays a critical role, especially when the surface is highly folded and convoluted like the cortical surfaces in Figure 1. The convolutional neural network (CNN) is known to be good at capturing not only the global features but also the local fine structure of data. Its effectiveness, however, relies on a proper distance function defined on the space of features. For example, the Euclidean distance between two vectors is a straightforward option. As the result, the bumpy circle (inset) will tend to be deformed through the neural network to the round circle, which is more regular and is close to the bumpy one under the measurement by Euclidean distance. In contrast, we adopt curvature representation and subsequently the distance between curvatures, by which two circles are clearly distinguishable, hence the small hills will be safely preserved. Secondly, our model is less affected by rigid transformation and uniform scaling. Thanks to the invariant quantities that constitute our representation and the CNN on sphere (see Section 4.2 for detailed discussion), we provide a simple and efficient way to handle the data without a consistent alignment.
The input shapes for our model are required to be surfaces with consistent simply-connected topology, e.g., the disk-like surface or the spherical surface. We first map the input surface to a canonical domain such as a sphere, where mean curvature and vertex density are extracted and recorded as the input data for the neural network. For generative models like VAE, the output is a variant of the input so it has the same form as the input. To reconstruct the shape, we first create a conformal parameterization by randomly sampling the points with respect to the generated density function and applying the isotropic remeshing. Then, we deform the mesh gradually towards the target shape with the prescribed mean curvature (see the attached videos).
A curvature-to-shape reconstruction algorithm with high accuracy is critical for generating plausible shapes. We follow the basic idea in [CPS13] and [YDT * 18]. The deformation between the domain and target shapes is given by the solution of the Dirac equation. We propose a modified equation with a larger solution space and it results in the reconstruction comparatively closer to the target shape. Furthermore, one might be concerned about the stability of curvature-based methods, since tiny errors in curvature might accumulate across the surface and significantly affect the final reconstruction. Indeed, in our case, previous methods fail to locally scale the shape correctly at regions with large curvature. In fact, it is hard to directly manipulate the local scaling with the Dirac equation. Therefore we design a new algorithm inspired by Chern et al. [CPS15] to calibrate the area scaling factor. This compensates for the shortcoming of the Dirac equation and significantly stabilize the reconstruction.
We evaluate our reconstruction algorithm on several shapes, showing that our method outperforms previous methods visually and quantitatively. In addition to some preliminary applications such as shape remeshing, interpolation and clustering, we demonstrate randomly generated shapes from various datasets and compare to other 3D generative models.
In summary, the contribution of this paper is 1) an improved algorithm for shape reconstruction from curvature with area calibration and 2) a 3D shape deep learning framework based on curvature.  Figure 3: The spherical conformal parameterizations of two animals are aligned by a Möbius transformation with three landmark points. Then, they are packed into tensors with dimension 320 × 32 × 32 × 2. This figure shows a linear interpolation between the curvature representation of two shapes and the resulting shape reconstruction from the curvature representation.

Related
two fundamental forms, an identical triangulation for all shapes, which is not always possible, is required.
Other options are point-wise shape descriptors such as the heat kernel signature [SOG09] and the wave kernel signature [ASC11]. Indeed, they have been employed in discriminative models for 3D shape classification and segmentation [BMM * 15]. But they can hardly be used for generative models, because it is unclear whether these shape descriptors completely determine the shapes or how to reconstruct shapes from them.
The idea of this paper comes originally from Bonnet [Bon67]. In fact, except for some very special cases, an immersed surface is completely determined by conformal structure, regular homotopy class and mean curvature half-density, which is a scale-independent variant of the mean curvature [Kam98]. The exceptions, called the Bonnet immersions, includes minimal surfaces, constant mean curvature surfaces and Bonnet pairs. In our case the regular homotopy class is unnecessary, since we only consider the simplyconnected surfaces which have only one unique regular homotopy class [Pin85]. In summary, generic simply-connected immersed surfaces are uniquely determined by the conformal structure and the mean curvature half-density.

Quaternions, Dirac-type operators and Spin Transformation
Now, we sketch the idea how to construct a surface from the mean curvature half-density. Roughly speaking, for every point on the surface we rotate its infinitesimal neighbourhood with a quaternion.
Recall that a quaternion is a 4-dimensional vector q = a + bi + c j + d j with the multiplicative structure: We always identify vectors in R 3 as pure imaginary quaternions (x, y, z) → xi + y j + zk.
Any quaternion can be written as q = |q|(cos θ 2 + sin θ 2 u), where θ ∈ [0, 2π) and u ∈ R 3 ⊂ H. It is well known that q gives a scale rotation in R 3 with scaling factor |q| 2 , rotation angle θ and rotation axis u. The rotation is given by The explicit construction of shapes from mean curvature halfdensity and conformal structure is called spin transformation. Suppose given an immersion of a surface f : M → R 3 and a quaternionvalued function on the surface φ : M → H, which is understood as a continuously varying rotation at each point. We scale and rotate every tangent plane by (1) However, there is no guarantee that these rotated tangent planes will again form a surface. For simply connected surface, df is again the tangent plane of an immersion of surface if and only if it is closed: It turns out to be equivalent to the Dirac equation [KPP98] where the Dirac operator is defined by and ρ : M → R is a real-valued function. Therefore, any solution of the equation (2) will induce a new immersionf : M → R 3 bỹ f = M df . Moreover, the mean curvatureH off is given bỹ where H is the mean curvature of the original surface f . Observe that, due to the scaling factor |df | in (4), one can not fully control the mean curvatureH. However, by introducing a variant notion, namely the mean curvature half-density: the equation (4) turns toh This means that the mean curvature half-densityh can be precisely realized as long as the solution φ for equation (2) exists.
Crane et al.
[CPS11] first discretize the equation (3) and show applications in computer graphics, such as curvature painting. The following works are, e.g., Crane et al. [CPS13] use the spin transformation for surface fairing. Liu et al. [LJC17] construct a continuous spectrum of operators between the square of the Dirac operator and the Laplace-Beltrami operator. These operators are utilized to enhance surface matching and segmentation problems. Ye et al. [YDT * 18] create a framework, which consistently discretized the extrinsic Dirac operator and an intrinsic Dirac operator. In this paper, we improve the reconstruction based on [CPS11, YDT * 18] by solving an equation with a larger solution space and introducing an area calibration (see Section 3.5).

Deep Generative Models for 3D shapes
Various representations of surfaces have been proposed for 3D shape generation, e.g., models based on volumetric representation [WZX * 16,TDB17,SM17,WLG * 17,WSLT18], or point clouds representation [FSG17,NW17,ADMG18]. These methods are particularly applicable for the dataset with inconsistent topology. However, without knowing the mesh structure it is hard to capture the fine structure of certain highly complicated surfaces (see Figure 1).
Our model is closer to the following works, which take the mesh structure into account. Ben-Hamu et al. [BHMK * 18] propose a representation based on multiple charts, which conformally map different parts of shapes to a domain. Since features over each chart are normalized separately, the fine structure will be better preserved than with a single chart. However, while the creation of such charts requires a sparse correspondence, reconstruction of shapes from the charts needs a template shape, which amounts to a dense correspondence. In order to find such correspondence, one has to introduce a time-consuming workflow beforehand. Groueix et al. [GFK * 18] learns a parameterization of shapes with multiple embedded charts. Hence one does not have to manually create the charts. However, the generated charts do not always perfectly fit with each other, nor do they preserve as much details as the ones in [BHMK * 18]. Umetani [Ume17] develops a depth map representation with a cube as the domain. This representation works well for close-to-convex shapes like cars, but would be difficult to be applied on highly curved and non-convex shapes. Kostrikov et al. [KJP * 18] use the same Dirac operator as ours. But they merely replaced the Laplace-Beltrami operator in the neural network with the Dirac operator, thus the real power of the Dirac operator, namely its connection to conformal transformation, is not exploited.

Method
The main pipeline of our model is depicted in Figure 2. In the sequel, we will explain the detailed methods for encoding shapes with curvature and vertex density in Section 3.1 and 3.2, building a neural network based on our representation in Section 3.3 and reconstruction of shapes in Section 3.4 and 3.5.
Encoding the Conformal Structure In discrete case, how to encode a shape in the scheme of the Bonnet problem (Section 2.1)? While the mean curvature half-density can be represented by a vertex-based or face-based function, it is not straightforward to pack the conformal structure in a form that is suitable for machine learning pipeline. For example, we can recover the shape of a cow from its spherical conformal parameterization ((b) in Figure 4) by prescribing the function of mean curvature half-density ((c) in Figure 4). But it is not clear how to represent a spherical mesh that is conformal equivalent to a given shape purely by scalar functions. One might consider the notion of discrete conformal equivalence for triangular meshes by length cross-ratio on edges ( [SSP08]). But it is unclear how to transfer the length cross-ratio across different meshes.
conformal add the curvature reconstruction shows that a simply-connected surface in R 3 can be faithfully reconstructed from its conformal parameterization by prescribing the mean curvature half-density.
Recall that the conformal structure is the set of metrics modulo the equivalence relation g ∼ e 2u g, i.e., two metrics are identified if they only differ by a scaling at each point. Therefore, instead of encoding the conformal structure, we encode the metric of shapes. In general, the space of all metrics still does not have an efficient form of representation, thus we focus on a smaller subset, i.e., the isotropic meshing. Since the conformal map is locally isotropic, i.e., it takes an isotropic mesh to a close-to-isotropic mesh (see the zoom-in in Figure 4), and we know that the isotropic meshing is usually generated by the centroidal Voronoi tessellation (CVT) with respect to a density function [ADVDI03], this density function can be utilized as an approximation of a metric. Therefore, at the beginning of our pipeline all the input shapes are isotropically remeshed (like (a) in Figure 4). Then, we successively take the following procedures.

Conformal parameterization
We map all the shapes to a canonical domain, e.g., the unit disk for disk-like surfaces and the unit sphere for spherical surfaces. The resulting disk-like or spherical meshes are called the conformal parameterization. However, these maps are not unique but differ by a conformal automorphism of the domain. To deal with the ambiguity one may choose from the following approaches depending on the application: Landmark alignment We know that the conformal automorphism of S 2 , i.e., the Möbius transformation, is fully determined by three distinguished points and the conformal automorphism of a disk is determined by one point and one rotation. Hence we choose two landmark points for disk-like surfaces and three landmark points for closed surfaces and align these landmarks via a conformal mapping. One example is shown in Figure 12.
Landmark-free alignment For example, [BCK18] proposed a canonical Möbius transformation such that the mass center is aligned with the sphere center. Then, we register two spherical meshes of centered Möbius transformations by searching for an optimal rotation.
Without any alignment at all This will result in a larger shape latent space and consequently poses higher demands on the capacity of neural network, because, for example, a rotation of shapes might also cause a rotation of curvature function. However, our model is particularly good at capturing this uncertainty (see the discussion in Section 4.2).

Making the representation
In order to build the neural network, we need some fixed meshes for canonical domains. In particular, we use the standard 256 × 256 grids for the disk. For the spherical domain, we obtain a spherical mesh by iteratively applying the 1-to-4 subdivision and normalization on an icosahedron.
Then, we interpolate the following two functions from the conformal parameterization of shapes to the domain with inverse distance weight.
Mean curvature half-density The mean curvature half-density h is a face-based function given by [YDT * 18] where the sum runs over all the edges e i j of the face T i , θ i j are bending angles at the edge e i j and A i are the face area.
Vertex density function We estimate the density function d by the reciprocal of vertex area, d i := 1/Ã i , whereÃ i is the vertex area of the conformal parameterization. We do not normalize the density d, since the integral of the piecewise constant function i is equal to the number of points located in the area U. At the step of reconstruction, this gives us the information about how many points should be sampled. In the experiment, we observe that the logarithmic densityd := log d is more evenly distributed. Therefore, the logarithmic densityd is instead recorded on the domain.

Building CNNs over meshes
Since the disk-like surface is represented like a 2D image with two channels, any classical CNNs can be directly applied. Hence, we will focus on the case of spherical surfaces. Each face of the domain is assigned with a tangent plane, identified with R 2 , at the barycenter. Let l be a positive number such that the projection of the triangular face lies entirely in the patch [−l, l] × [−l, l] on the tangent plane. This projection π gives a local coordinate system of the points in the pre-image π −1 ([−l, l] × [−l, l]) ⊂ S 2 . Hence, the functions restricted in this region can be interpolated to some grids on the patch. The distortion caused by the projection is neglectable when the size of the patches is small. We choose a fixed length l such that all the triangular faces on the domain are projected inside the corresponding patches. The convolution is the ordinary 2D convolution within each patch with the shared ïňĄlter weights across different patches. Downsampling and Upsampling layers Like the MaxPooling and UpPooling layers for classical CNNs, we need the same sort of operations for mesh domain to decrease and increase the spatial dimension of neural network. One can first apply the ordinary 2D pooling layers within each patch. Furthermore, since our spherical domain is constructed by subdividing an icosahedron, it is naturally endowed with a hierarchical structure ( Figure 5), which gives rise to downsampling and upsampling layers between spherical meshes with different refinements.

Convolution layers
The detailed architectures of our convolutional neural networks are depicted in the appendix.

Reconstruction of Conformal Parameterization
In order to construct a conformal parameterization from a given vertex density function d, we first randomly sample n i points in every faces of the domain, where n i = d iÃi andÃ i is the face area. Next, an isotopic meshing is constructed as follows.

Centroidal Voronoi Tessellation
The isotropic meshing is usually made by centroidal Voronoi tessellation [DFG99]. Given a set of points {v i } in a metric space, particularly R 2 or S 2 . The Voronoi region V i corresponding to v i is defined by which are polygons (see Appendix 7.1 for the formula for computing the weighted centroid of polygons). Given a density function d, the centroid v * i of the polygon V i is given by We call a point set {v i } the weighted centroidal Voronoi tesselation if v i = v * i holds true for all i.
Sampling Voronoi D. 1-st iter. 5-th iter. Delaunay Tri. Figure 6: Centroidal Voronoi Tessellation. In order to obtain an isotropic meshing with respect to a given density, we first sample a point set according to the density and repeatedly apply the Lloyd's relaxation. Observe that the point set becomes more and more isotropic as the iteration goes.
In this paper we use Lloyd relaxation to compute the CVT. Given a point set {v i } we iteratively update the point v i with the corresponding centroid v * i until it converges (see Figure 6): 1. Randomly sample the points with respect to the density d (defined in Section 3.2). 2. Create the Voronoi diagram. For the disk case, we have to be a bit careful that the Voronoi cells close to the boundary are mostly unbounded. Hence we reflect the points close to the boundary, so that all the Voronoi cells inside or close to the unit disk are bounded. 3. Compute the weighted centroids of the (bounded) Voronoi cells and, for the disk case, remove the points lying outside the disk (see Figure 7). Then, a Delaunay triangulation is constructed by taking the dual of the Voronoi diagram. Generally, this triangulation does not perfectly fit the disk at the boundary, but it does not significantly affect the global appearance of shapes.

Surface Reconstruction
Now, we are ready to reconstruct the surface from a conformal parameterization with prescribed mean curvature half-density. In the following we first demonstrate an improved reconstruction method which is a slight modification of [YDT * 18] and then introduce a new procedure of area calibration, which would be particularly effective when the area scaling is not accurately restored by the previous method.
Dirac Energy In practice, the exact solution of the Dirac equation (2) can hardly be obtained, so we actually search for the solution φ : M → H such that: for a very small real number σ, which actually amounts to the eigenvalue problem where λ is the eigenvalue with the smallest magnitude [CPS11].
In discrete case, D f − ρ is a |F| × |F| quaternion-valued matrix [CPS11], or in practice, a 4|F| × 4|F| real-valued matrix such that any quaternion q = a + bi + c j + dk is represented by a 4 × 4 realvalued block matrix: We briefly introduce the discretization of the matrix D f − ρ and refer the reader to [CPS11, YDT * 18] for more details. Let e i j ∈ Im(H) be the oriented edge embedded in the quaternion space and H i j := 1 2 |e i j | tan θi j 2 be the integrated mean curvature at the edge e i j , where θ i j is the bending angle between the face i and j. The matrix of the Dirac operator is a 4|F| × 4|F| matrix D f given by where E i j := 2H i j + e i j and H i = ∑ j H i j . The discrete form of ρ is a 4|F| × 4|F| diagonal matrix P with the discrete mean curvature half-density (7) as the diagonal. Instead of building the target shape in one step, we slowly flow the initial shape to the target for the purpose of stability. Hence, we build the matrixD(t) = D f − tP, where t ∈ [0, 1] is a step length parameter.
We observe that, even though this face-based Dirac operator gives the exact solution, it is not numerically stable, because its solution space is often too large (technically, some solutions that give the edge-constraint normals far from the actual face normal will result in unwanted transformations). On the other hand, while the vertex-based operators in [CPS11, YDT * 18] works well in many cases, they are not able to faithfully recover the high curvature regions on the surface, because their solution spaces are too limited.
To have a balance between these two approaches we propose the following regularized energy based on the face-based operator: where c is a positive coefficient and R is the 4|F| × 4|F| regularization matrix such that where the sum runs over all adjacent faces i and j. Note that the weights with the dual edge length are used in [CKPS18]. To have finer control of the regularizer, one can decompose R into four components and set different weights as in [CKPS18], but we did not see that this will make any obvious difference in our setting. Empirically, the coefficient c is set to be 0.001 max i j |e i j |.
By the min-max principle, solving the generalized eigenvalue problem where M is the mass matrix, is actually equivalent to minimizing the energy min E D , s.t. |φ| = 1, with the metric defined by |φ| 2 := φ T · M · φ.
Finally, the edges are constructed by the spin transformation the position of vertices v i are recovered by solving the Poisson equation (see Section 3 of [SA07] or Section 5.6 of [CPS11]). In the attached videos, we prescribe the mean curvature half-density of two shapes (red) on their conformal parameterization (blue) and it shows deformation from the sphere to the original shapes.
Area calibration Even though the Dirac operator with regularization term improves the accuracy of reconstruction, we observe that some area distortion is still visible, especially at the region with really high curvature. To overcome this problem, we make the reconstruction algorithm be aware of the area scaling factor. Chern et al. [CPS15] prescribe a volumetric scaling factor e u and obtains the close-to-conformal volumetric deformation by minimizing an energy Eu depending on u. While the energy Eu in [CPS15] is specifically designed for 3D volumetric meshes, an analogy for 2D surfaces still holds in smooth case: Theorem 3.1 Let f : M → R 3 ⊂ H be an isometric immersion and h : M → R be any function. The quaternion gradient is defined by

The spin transformation
where G := grad f u is the gradient of the logarithmic factor e u := |φ| 2 .
Proof See Appendix 7.2.
Therefore, given a spin transformation induced from φ with the area factor u = log|φ|, the quaternion-valued 1-form In practice, we minimize the energy Eu := |ω| 2 , where the metric for quaternion-valued 1-form is defined by In discrete case, minimizing the energy Eu again amounts to solving a generalized eigenvalue problem for a 4|F| × 4|F| matrix (see Section 7.4). To avoid introducing the scaling factor as one more function in our representation and subsequently increasing the data size, we first apply the isotropic remeshing with approximate equalized face area [FAKG10] for all shapes. In this case the logarithmic factor u should be set to u i = log(1/ |Ã i |), whereÃ i is the face area of the conformal parameterization. While the Willmore energy is defined by W = ∑ i h 2 i , we define the relative Willmore energy between two meshes with identical connectivity by r.W := ∑ i ((h 1 ) i − (h 2 ) i ) 2 , which measures how close the mean curvature half-density of two meshes are. This experiment shows that our method substantially improves the accuracy of curvature reconstruction. Furthermore, the area distortion, which usually appears in the regions with high curvature, gets much reduced by the area calibration. Note that, in contrast to [CPS15], we only encode the expected scaling factor in the energy |ω| 2 and the factual scaling factor |φ| 4 is determined by the optimizer.
In summary, we first minimize the energy E D with a small step length several times until the mean curvature half-density converges to the prescribed one. Then, we minimize the energy Eu once to get the correct area scaling factor.

Results
We use the Matlab package gptoolbox [J * 18] for data preprocessing and Tensorflow [AAB * 15] for building the neural networks on meshes. All the neural networks are trained and evaluated with the GPU GeForce GTX 1080 with 8GB memory. The mean curvature half-density changes accordingly such that the mean curvature is preserved.

Preliminary applications
We first present some simple applications that are unrelated to machine learning.
In smooth case, the mean curvature half-density changes covariantly h → m · h under the parameterization scaling x → m · x, m ∈ R. Analogously, in discrete case, one can adjust the parameterization by scaling the vertex density, i.e., multiplies the density d with a constant number, d → md. In order to preserve the shape, one has to adjust the mean curvature half-density by h → h √ m . The shapes reconstructed from the modified representation are actually remeshings with approximately m|V | vertices, where |V | is the number of vertices of the original mesh. Figure 9 shows that our method will preserve the smooth features on the shape. However, the regions of high curvature tend to be smoothed with declining vertex number.

Shape interpolation
We visualize the interpolation of our curvature-based representation. Figure 3 shows the shapes reconstructed from a linear interpolation of two animals, whose conformal parameterizations are matched by a Möbius transformation that aligns 3 chosen landmark points. In addition, one can interpolate the latent space representation of a trained autoencoder (see Section 4.3). Figure 11 shows two latent space bi-linear interpolations of cars.
Random generation of disk-like and spherical shapes We test our model for disk-like surfaces on a dataset of anatomical shapes provided by [BLC * 11]. In particular, we choose the shapes of teeth, which is one of three types of bone in this dataset. To create the representation, we first take an intermediate conformal map, which maps the teeth to the unit disk by the algorithm from [CL15].
Several landmark points are available in [BLC * 11], hence we choose two landmark points u i , v i for every shape M i . We know that the conformal automorphisms of the unit disk have the form where θ ∈ R and a ∈ C. Set a = u i and θ such that f (v i ) ∈ R. Clearly, this uniquely determined map f a,θ satisfies f (u i ) = 0 and f (v i ) ∈ R. Fixing a reference shape M 0 , for any shape M i we apply the alignment map f −1 0 • f i for every shapes. All the aligned disk meshes are then mapped to the square via the Schwarz-Christoffel mapping. The functions are interpolated on the 256 × 256 grid using the scatteredInterpolant function in Matlab.
For spherical surfaces we take the dataset of 1240 cars from ShapeNet [CFG * 15]. All the shapes are converted into genus-0 surfaces by Umetani [Ume17]. Then we create the aligned conformal parameterization by the canonical Möbius transformation [BCK18]. The canonical domain with is obtained by subdividing the icosahedron twice so it has 20 × 4 2 = 320 faces. Each face is assigned with a 32 × 32 grid. Hence, each shape is represented by a 320 × 32 × 32 × 2-dimensional tensors.
The randomly generated teeth and cars are shown in the appendix as well as their curvature representation. Discussion of local invariance We call two functions f 1 and f 2 local invariant if they have the same function value but only differ by a transformation g of domain, i.e., f 1 = f 2 • g. Traditional CNNs are able to capture the translational features such as (a) of inset. Hence one would expect the CNNs for 3D shapes with the similar properties like local invariance under translation, rotation or even scaling. However, 3D generative models based on position, such as point cloud and mesh, will not have such properties due to the varied function value of coordinates (see (b)). This makes it more difficult for CNNs to extract meaningful information. The voxel-based models are local invariant, but they are not applicable for data with high resolution due to the high cost of memory and computation. Some multi-resolution representations, e.g., octree [TDB17, WSLT18], are designed to overcome this problem, but the local invariant property does not hold anymore. In contrast, our model (sketched by (c)), together with the CNN on the sphere, provides an efficient way to learn the 3D data without a certain alignment. We verify our argument with the following two examples.

Learning unaligned anatomical data
We merge three different anatomical models in [BLC * 11] and create the representations without any alignment methods. Insect shows the randomly generated bones of different types. Compared with Figure 16 the bones get smoothed due to the expanded shape space. However, we show that our model is still capable to extract the meaningful information from the ambiguity by visualizing the latent space distribution ( Figure 13). We compare the result with a baseline model that has the same network architecture but operates on the coordinate functions.  Figure 10: The randomly generated cortical surfaces by Multi-chart GAN [BHMK * 18] and the VAE based on our representation. Our representation has dimension 320 × 32 × 32 × 2 = 655360, which has the same magnitude as the data size of Multi-chart, i.e., 16 × 64 × 64 × 3 = 196608. However, we only require 3 landmark points for alignment, while the Multi-chart needs a dense correspondence for surface reconstruction. The surfaces are labeled by the mean curvature half-density. Note that, the training data mostly have the Willmore energy from 900 to 1000. Although the generated surfaces from our model have been smoothed to a certain extent (partly due to a well-known limitation of VAE), our model apparently preserves more fine structures than the position-based model. any conformal map alignment f Figure 12: For disk-like surfaces, given two landmark points there is a unique conformal map which maps the first point (red) to zero and maps the second one (blue) to the x-axis.

Generation of transformed cars
100 validation data. The comparison shows that our method produces more accurate predictions than others (see Figure 15). Since coordinate curvature teeth mt1 radius Figure 13: Latent space visualization. The dataset is composed of three different types of anatomical surfaces. We project the latent space representation on a 2-dimensional space by PCA. Though all the shapes are packed without alignment, the three types of bones are clearly separated in the latent space. In contrast, the model based on the coordinate failed to learn the structure of the bones, so their distribution in the latent space is not well separated.
only our model considers the mesh structure of shapes, to make a fair comparison, we evaluate the results with Chamfer distance which only depends on the underlying point clouds. Note that, as a trade-off, our representation loses the information of translation and scaling. Thus we first normalize the shapes reconstructed from our model and then calculate the Chamfer distance to the ground truth.

Cortical surface generation
To show that our model is particularly good at preserving the fine structure, we perform the experiment on human cortical surfaces, which are highly folded with a lot of "hills" and "valleys". We first compare our model to three other state-of-art autoencoders for 3D shapes. Figure 1 shows that, although all models succeed in characterizing the shapes in a large scale, our model preserves much more small features, e.g., the curvature, than the others.
Training details Our model and the baseline model are trained with 200 epochs for around 5 hours. The point-cloud AE [ADMG18] with 2048 points for each data and AtlasNet [GFK * 18] with 2500 points for each data are both trained with 500 epochs for approximately 4 hours. Although the point-cloud based models above have smaller data size than ours, the training of their neural networks already exhausted our GPU memory. The OGN, with the octree representation of 128 × 128 × 128-dimensional voxels, is trained with 4000 epochs with 5 hours. While other models produce the shapes instantly after training, it takes 2 minutes with our method to reconstruct a mesh with 10000 vertices from curvature.
Next, we compare the cortical surfaces randomly generated by our VAE to the ones by Multi-chart GAN [BHMK * 18] (Figure 10). While both mesh-based models generate significantly more faithful results than other types of representation in Figure 1, the "hills" and "valleys" are much more visible with our model. Moreover, we only choose 3 landmark points on each shape to align the conformal parameterization, while it requires 21 landmark points to create 16 charts as in [BHMK * 18], and even a template shape, which amounts to a dense correspondence, to reconstruct the final shapes.
At last, we try to create an autoencoder that converts the 3D MRI images of brain to cortical surfaces. In this case, the encoder consists of several 3D convolutional layers (see Figure 3) and the decoder is the same as the ones in previous experiments. Figure 14 shows that our model is able to predict the cortical surface from the MRI volume to a certain extent, but the accuracy is not yet optimal, because the neural network failed to capture the spatial correspondence between the volumetric data and the spherical data. We leave the construction of a finer 3D-to-2D autoencoder to future work.

Ground
Failed Truth Example First, currently it is difficult to model the shapes like long tubes, such as arms and legs of human, because the conformal parameterization of such shapes always has extremely large area distortion. The information easily gets lost while being transferred from such regions to the canonical domain (inset), unless one uses a domain with extremely high resolution. A solution might be a multiresolution data structure, such as [GKS02,WSLT18]. Then it is desirable to design a structure of neural network that is specifically adapted to such multi-resolutional data structures.
Second, to make our model fully rotational invariant rather than just local invariant, one might combine our representation with the equivariant neural networks by Cohen et al. [CGKW18], so that the alignment procedure can be completely removed. Then it would be interesting to develop a corresponding decoder network.

Conclusion
We propose a novel intrinsic representation of 3D surfaces based on mean curvature and metric. A 3D generative model is built based on this representation and it manifests better performance than other models in capturing the fine structure and the symmetry of the ambient space. : Autoencoder for transformed cars. We transform a shape of car by applying random translation, scaling and rotation. We demonstrate our results with other models based on the point clouds, namely the point-cloud AE [ADMG18] and the AtlasNet [GFK * 18], the one based on voxels, namely the O-CNN [WSLT18]. Other methods, though were shown to achieve satisfying results on the aligned dataset, do not correctly capture the symmetry of various transformations. In contrast, our model succeeds in producing convincing transformed shapes. We evaluate the results by measuring the Chamfer distance CD. However, since our model loses the information of translation and scaling, we have to first normalize the volume of the results with a centered position (unnormalized shapes are shown above). In the end we compute the Chamfer distance of the normalized outputs CP.  Figure 16: Randomly generated teeth and cars via the variational autoencoder. The first and third rows show the isotropic meshings, which are induced from the generated density function, with the generated mean curvature half-density. The second and fourth rows show the resulting reconstruction. The architectures of neural networks are modified from the traditional autoencoders in Table 1 and 2 to variational autoencoder.