Voxel‐wise UV parameterization and view‐dependent texture synthesis for immersive rendering of truncated signed distance field scene model

In this paper, we introduced a novel voxel‐wise UV parameterization and view‐dependent texture synthesis for the immersive rendering of a truncated signed distance field (TSDF) scene model. The proposed UV parameterization delegates a precomputed UV map to each voxel using the UV map lookup table and consequently, enabling efficient and high‐quality texture mapping without a complex process. By leveraging the convenient UV parameterization, our view‐dependent texture synthesis method extracts a set of local texture maps for each voxel from the multiview color images and separates them into a single view‐independent diffuse map and a set of weight coefficients for an orthogonal specular map basis. Furthermore, the view‐dependent specular maps for an arbitrary view are estimated by combining the specular weights of each source view using the location of the arbitrary and source viewpoints to generate the view‐dependent textures for arbitrary views. The experimental results demonstrate that the proposed method effectively synthesizes texture for an arbitrary view, thereby enabling the visualization of view‐dependent effects, such as specularity and mirror reflection.


| INTRODUCTION
Immersive video [1] has gained widespread attention as a key content of the virtual reality (VR) market owing to technological advancements of 3D capture and display systems. Immersive video technologies allow viewers to immerse themselves in the content by providing freedom of movement inside the content and experience of motion parallax from the viewpoint movement [2]. Recently, the creation of immersive videos using 360 videos [3], multiview + depth (MVD) [4], and plenoptic point clouds [5] has been actively studied by MPEG-I [6].
The truncated signed distance field (TSDF) fusion of multiple depth maps taken from different viewpoints [7][8][9][10] is one of the most popular volumetric scene reconstruction methods recently [11]. It is an implicit surface representation of the captured scene, which is represented by the 3D voxel grid and each voxel stores the truncated signed distance [12] between its center and the nearest surface measurement [13]. Furthermore, it can be converted into a high-quality mesh representation using the marching cube (MC) algorithm [14], which traverses all cubes formed by eight adjacent voxels and generates a set of triangles in predefined configurations. However, TSDF is not yet implemented as a technology for creating immersive videos owing to the lack of high-quality texture mapping and immersive rendering methods.
More so, one limitation in the mapping of highquality textures to the TSDF scene model is that the meshes extracted from the TSDF scene models have many triangles, ranging from hundreds of thousands to millions, and conventional UV parameterization [15][16][17] for unwrapping these huge meshes into a 2D UV (texture) domain is computationally expensive.
To address the aforementioned problem, we proposed a voxel-wise UV parameterization method using a UV map lookup table, specially designed for the mesh extracted using the MC algorithm from the TSDF scene model. Since the UV map lookup table stores the precomputed UV map for each MC configuration, no effort is required for UV parameterization other than to query the lookup table. Additionally, recording the UV coordinates for associating mesh vertices with UV space is unnecessary since they can be found immediately on the UV map lookup table from the MC configuration of each TSDF voxel.
Thus, from our convenient UV parameterization, we proposed a view-dependent texture synthesis method for the TSDF scene model. First, our method generates multiview local texture maps that share an identical UV map for each voxel from multiview color images. Textures taken from different viewpoints are aligned in the UV domain. So, we analyzed the texture intrinsics from multiview local texture maps using texel-wise operations. By leveraging the texel-wise analysis, we extracted and assigned a single view-independent diffuse map and set of weight coefficients representing the variation of viewdependent specular maps for each voxel. During the texture synthesis phase for an arbitrary target view, the set of weights assigned to each voxel is combined with consideration to the location of the target and the source viewpoints, to estimate specular maps of the target view. Subsequently, the estimated specular maps and assigned diffuse maps are voxel-wise added to produce final local texture maps. Our contributions are the following contributions: • We introduced a novel voxel-wise UV parameterization for the TSDF scene model using the UV map lookup table. Our UV parameterization enables fast, efficient, and high-quality texture mapping without complex computation to obtain UV coordinates.
• We presented a simple diffuse-specular map separation and view-dependent specular map estimation using our texture representation. These techniques allow viewers to observe the variation in the texture of specular surfaces under varying viewpoints. • To the best of our knowledge, this is the first attempt at a view-dependent texture synthesis for the immersive rendering of the TSDF scene model.

| RELATED WORKS
Many existing methods have used color volume-based texture mapping for the TSDF scene model [18][19][20][21]. The TSDF fusion phase integrates the depth and corresponding color information into the TSDF and the color volume that shares the same voxel grid structure, respectively. Then, the mesh and per-vertex colored mesh are generated using the MC algorithm from the TSDF and assigning interpolated color values from the color volume to all vertices of the mesh, respectively. Rendering of the per-vertex colored mesh only uses three vertex colors to fill the face of the triangle. Here, while the color volume-based texture mapping is easily performed, the quality of the rendered images depends on the voxel resolution. Since the size of the MC triangles depends on the voxel size used, a smaller voxel causing higher voxel resolution is required to represent the high-frequency texture details. However, this produces a higher quality of the rendered image, higher computational complexity, and a huge mesh. Figures 1A and 1B show the rendering results of two per-vertex colored meshes. Each mesh is generated from the TSDF model with a voxel size of 8 and 4 mm and has 44 134 and 162 402 triangles, respectively. Although the per-vertex colored mesh generated from the TSDF with a higher resolution showed better visual quality, triangles that required rendering significantly increased. However, conventional UV mapping eliminates the dependence of voxel resolution and texture quality [24,25]. The UV map is a bijective mapping from a 3D mesh surface to a 2D UV plane [26], which improves the texture quality by storing more texture information in the UV plane than the per-vertex colored mesh. Figure 1C is the result of mapping the UV texture in Figure 1E to the mesh extracted from the TSDF model having a voxel resolution of 8 mm. Compared with results of color volume-based texture mapping, the UV texture-mapped result shows considerably improved visual quality, even when the same or lower voxel resolution is used. However, conventional UV mapping takes long to perform UV parameterization [16,26], which involves wrapping a target mesh into a 2D UV plane and storing UV coordinates as attributes on all vertices of a mesh.
To avoid the complex UV parameterization process, the latest methods have been developed for mapping between the 3D surface and texture. Tang and others [27] divided the entire volume into non-overlapping blocks and separately parameterized the 3D surface in each block. Thus, block-based parameterization was performed such that triangles in each block are grouped by normal vectors, each group was projected onto their respective tangent planes, and the projection results were packed into a fixed single 2D plane. Through this simple parameterization, UV coordinates were unnecessary because they could be quickly inferred from the TSDF. Lee and others [28] assigned a 2D regular texture patch for every nonempty voxel. In the texture extraction phase, each texture patch was projected onto an input color image by selecting an axis direction that minimized sampling loss. Subsequently, the textures were extracted and stored within each texture patch for texture allocation or combined with the existing texture for texture optimization. This patchbased texture representation enabled high-quality texture mapping to the TSDF scene model without UV mapping.
Similar to Lee and others [28], we assigned a 2D regular texture patch for every nonempty voxel. However, the associating 3D surface to the texture patch of our proposed method corresponds to a UV map that is generated using the UV map lookup table. UV mapping allows an easy multiview texture analysis because it is possible to collect color values for a point on the 3D surface projected onto different-viewpoint color images [29]. By taking advantage of the UV mapping feature, we considered view-dependent texture mapping techniques that were not previously studied.

| TRUNCATED SIGNED DISTANCE FIELD FUSION AND MARCHING CUBE ALGORITHM
This section reviews the TSDF fusion and MC algorithm to provide the basic concepts underlying our voxel-wise UV parameterization method.

| Truncated signed distance field fusion
When all N input depth maps fD i g N i¼1 taken from known viewpoints T i SEð3Þ with camera intrinsics K i are given in advance, TSDF fusion integrates multiview depth information into a TSDF volume D ℝ XÂY ÂZ in a weighted average manner as follows: WðxÞ ¼ where x is the position of a voxel defined by its center, and w i (x) is the fusion weight related to the uncertainty of depth measurement [13]. Many approaches set w i ðxÞ ¼ 1 for voxels that require an update, otherwise they are set to w i ðxÞ ¼ 0 [7]. To obtain a truncated signed distance d i (x), a signed distance between the measured depth at the voxel at the 2D pixel location and that at the z-coordinate is initially computed. Then, the truncated signed distance is determined using a truncation value d trunc : where π normalizes homogeneous 2D pixel coordinates to inhomogeneous coordinates. Thus, D i ðπðK i T i xÞÞ is the measured depth when the voxel x is projected onto the depth map D i . Accordingly, ½T i x z is the z-coordinate of the voxel x in the ith camera coordinate system. The signed distance s i (x) indicates the distance of the voxel x to the nearest surface measurement. A positive or zero signed distance indicates that the voxel x is inside the surface, whereas a negative signed distance indicates that the voxel x is outside the surface. Since large signed distances are irrelevant in surface reconstruction [12], s i (x) can be truncated using d trunc so that d i (x) ranges from Àd trunc to +d trunc .

| Marching cube algorithm
The MC algorithm [14,30] is the most popular technique for creating a triangular mesh from the TSDF representation. It iterates ("marching") the algorithm over the cubes formed by eight adjacent voxels. In each iteration, the eight voxels of the current cube and twelve edges between the voxels were indexed in the prescribed order ( Figure 2), and an 8-bit MC configuration index was obtained by expressing in binary whether the TSDF value of the ith voxel is greater than or less than zero. Thus, the MC algorithm ensures that one of 2 8 ¼ 256 possible configurations ( Figure 3A) is assigned for each zero-indexed voxel of every cube. Configuration 0 or 255 indicates empty cubes because all eight voxels lie inside or outside the surface. Each entry in MC lookup table [31] contains an edge sequence representing the mesh topology of the corresponding configuration. For example, given a MC configuration index of 3, the MC lookup table returns the following sequence: {1, 8, 3, 9, 8, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}. This sequence indicates the vertices placed on edges 1, 8, and 3, and edges 9, 8, and 1 are connected to produce two triangles. The 3D coordinate of each vertex is interpolated using the 3D coordinates and TSDF values of voxels at the edge where the vertex is placed. When a vertex p is placed on the edge where voxels x i and x j are connected, the 3D coordinates of p are linearly interpolated as follows: The MC algorithm sequentially extracts a local mesh for each cube using the mesh topologies predefined in the MC lookup table and the 3D coordinates of the vertices. Finally, the local meshes of the cubes are merged to generate the complete mesh for the entire scene.

| VOXEL-WISE TEXTURE MAP GENERATION
The TSDF scene model visualization is achieved using the widely employed per-vertex colored mesh generation from TSDF and color volume [18][19][20][21]. However, the rendered image of a per-vertex colored mesh is blurry and has a low visual quality because the number of pixels that each triangle of the mesh renders is greater than the number of colors mapped to the triangle. Therefore, instead of the color volume, we employed full texture mapping based on UV mapping to enhance the visual quality of the TSDF scene model. Contrary to conventional UV mapping, our proposed method performs UV parameterization to generate a local UV map for the local surface of each cube using a UV map lookup table.

| UV map lookup table
The MC algorithm extracts one to five triangles for each nonempty cube from a TSDF scene model, resulting in the final mesh having many triangles. For example, 400K triangles were extracted for our smallest scene model (Room-Near with 8 mm voxel res.) and 15M triangles for the biggest model (Rest.-Far with 4 mm voxel res.). To compute the UV map, we employed several steps to unwrap these huge meshes onto a 2D plane using traditional UV parameterization. These steps include [32] mesh simplification, mesh chartification, chart parameterization, and atlas packing, which is computationally expensive. To mitigate this, we designed a UV map lookup table consisting of 256 UV maps that respond to each topology configuration since the MC algorithm uses the predefined 256 topology configurations (Figure 3).
When the template mesh has a complex geometry, it is subdivided into two or three mesh pieces based on angular changes of 66 ; Each mesh piece (or non-divided template mesh) is projected onto the tangent plane orthogonal to the average normal.
Finally, the 2D projection results are arranged inside a common 2D unit plane to create the UV map. After parameterization, the UV lookup table comprises all 2D vertex coordinates of each projected template mesh in the corresponding entry. Figure 3 shows 15 basic cases of 256 MC configurations and the corresponding UV maps in the UV map lookup table.

| Local texture map generation
Each voxel is assigned its corresponding UV map in the UV map lookup table using the assigned MC index. The real mesh of each voxel extracted by the MC algorithm and the template mesh constructed to create a corresponding UV map are different but share the same topology. Here, all 3D triangles in the local surface of each voxel and 2D triangles of the corresponding UV map are uniquely mapped. Therefore, UV texture maps for the surface of every nonempty voxel were generated by referring to their UV maps.
Suppose t x,i denotes S Â S Â 3 local texture map for observing the voxel x's surface from the viewpoint i. Each texel color of t x,i can be obtained by projecting the 3D point associated with the texel point onto the color image C i : where q is a 2D texel point and p is the associated 3D point on the voxel x's surface. The 3D world coordinates associated with 2D texel coordinates can be computed using the barycentric mapping, and vice versa [25].
To generate a common local texture map t x that applies to any viewpoint, we used the visibility-weighted average of local texture maps from all N input viewpoints ( Figure 4) as follows: where J and denote texel-wise multiplication and division, respectively. m x,i is a binary matrix representing the visibility of each texel of t x,i and obtained using the Z-buffer algorithm.
Each averaged local texture map was assigned to its corresponding voxel. However, UV coordinates need not be stored separately because they are embedded in the UV lookup table. Therefore, to render the TSDF scene model, the MC configuration and mesh for each voxel of the TSDF were recalculated. First, the UV map was retrieved from the UV lookup table and the assigned

| Texture packing
Since our method extracts textures locally and separately for each voxel located on the surface, creating a single global texture map requires a method of packing the textures. Here, we arranged local texture maps of (7) in square space according to the 3D Morton code [22] of the voxel position ( Figures 1F and 6C). By employing 3D Morton code, the packing method easily calculated the location of a local texture map in a global texture map, while most conventional packing methods require storing the locations of the local texture map.

| VIEW-DEPENDENT TEXTURE SYNTHESIS
For texture mapping of the TSDF scene model, color volume-based and recent methods [27,28] have generated a single texture model from multiview images that can be mapped onto the surface of the TSDF scene model independent of the rendering viewpoint. Although, a viewindependent texture model can be optimized to render all viewpoints in high quality, in some cases, such as rendering glossy and mirror-like surfaces, it is impossible to show changes in texture as viewpoint changes, thereby reducing the viewer's sense of reality.
To deal with this problem, we proposed a method to synthesize view-dependent texture using view-dependent specular map estimation. To estimate the view-dependent specular map, a simple diffuse-specular map separation was performed using our texture representation, followed by an eigenspace analysis of the separated specular maps.

| Diffuse-specular map separation
The dichromatic reflection model [33] describes light reflected from an object surface as the sum of the diffuse reflection independent of the viewpoint and specular reflection dependent on the viewpoint. We separated the local texture maps generated for N input views at the location of a voxel x into a single diffuse texture map and N specular maps by applying the dichromatic reflection model. (Some voxels are assigned less than N local texture maps because they are invisible from some viewpoints, but we assumed that all nonempty voxels were assigned N local texture maps to simplify the notation.) With the averaged texture t x calculated using (7) considered a diffuse texture map, the specular texture mapt x,i for an input viewpoint i and voxel x is calculated as follows:t Since texel values at the same location of local texture maps t x,1 , …, t x,N are color values observed on N different viewpoints for the same 3D point, we can calculate specular maps defined in the dichromatic reflection model using the texel-wise subtraction of the local texture maps and the diffuse texture map (Figure 4).

| View-dependent specular map estimation
To render an image of arbitrary viewpoint θ in our texture mapping, the view-dependent specular mapt x,θ was added to the view-independent diffuse map t x to obtain a local texture map t x,θ for the surface of the voxel x visible at the viewpoint θ: If an arbitrary viewpoint does not belong to the set of input viewpoints, the specular mapt x,θ is estimated from the available specular maps.
To estimate the specular map, the full voxel grid was divided into non-overlapping voxel blocks and the distribution of the specular maps for each voxel block was modeled. Let X ¼ fx 1 , x 2 , :::, x M g be a set of M nonempty voxels in a K Â K Â K voxel block. All specular maps associated with the voxel set X are spread out into column vectors and stacked horizontally to create data matrixT X :T X ¼ ½t x 1 ,1 ,t x 1 ,2 ,:: wheret x,i is a column-vectorized representation of a specular map and has a length of L ¼ S Â S Â 3. Therefore, the data matrixT X has a size of L Â M Á N. Now, we performed singular value decomposition (SVD) [34] on the covariance matrix of specular vectorsT XT > X to determine the eigenvectors and their corresponding eigenvalues:T where V X is a diagonal matrix containing the eigenvalues ofT XT > X , assuming that the eigenvalues are sorted from the largest to the smallest. U X is an orthonormal matrix in which columns are eigenvectors ofT XT > X . Thus, each individual specular vectort x,i can be represented by a linear combination of the eigenvectors: The projection of the specular vector onto the eigenspace yields the specular weights s x,i , which linearly approximates the specular vectort x,i .
Given an arbitrary viewpoint θ, we selected the three closest input viewpoints i, j, and k to an arbitrary viewpoint in Euclidean distance, and computed the barycentric weights ω i , ω j , and ω k [29]. From the viewpoint θ, the approximate estimates of the specular vector of voxel x were calculated using the barycentric weights as follows: Finally, the resulting specular vectort x,θ was reshaped to convert the estimated specular mapt x,θ .
By assigning the diffuse map and set of specular weights to each nonempty voxel and storing the eigenvectors of each voxel block, we estimated local specular maps and generated local texture maps for an arbitrary viewpoint at rendering time.

| EXPERIMENTAL EVALUATION
Datasets. We created four multiview RGB-D datasets of static indoor scenes by capturing two synthetic models twice, as shown in Figure 5. A virtual 9 Â 9 RGB-D camera array with a fixed baseline (6 cm) captured each scene near and far from the wall. Each dataset consists of 81 8bit-RGB and 16bit-depth maps. All RGB images and depth maps have a resolution of 3840 Â 2160.
Truncated signed distance field and texture generation. Two TSDF models with voxel resolutions of 4 mm and 8 mm were created for each dataset using the TSDF fusion described in Section 3.1. So, eight TSDF scene models were used. Also, corresponding color volumes were integrated for comparison with our texture representation. For our local texture map generation and specular map modeling, we set the size of the local texture map and voxel block size to 8 Â 8 and 8 Â 8 Â 8, respectively.
Source view reconstruction. To quantitatively evaluate the texture quality, we measured the PSNR of the reconstructed source views using color volume, our viewindependent texture of (7), and view-dependent texture of (9). In this paper, we did not consider conventional UV texture mapping for comparison due to the following reason: When the LSCM parameterization [15], which a conventional UV texture mapping, was performed on the test scene models using the blender [35], it produced severe rendering artifacts that significantly reduced PSNR by generating overlapped UVs ( Figure 6). PSNR was measured from the reconstructed views of the 5 Â 5 viewpoints in the center of the 9 Â 9 source viewpoints of each dataset. Table 1 shows the average PSNR for each texture representation. The viewdependent texture showed produced the best results, followed by the view-independent texture which showed better results than color volume, even if the resolution of the voxel was low. Even with a higher voxel resolution, the color volume and view-independent texture did not significantly improve the source view reconstruction performance. However, a great improvement was recorded with the use of our view-dependent texture.
In Figure 7, the results of rendering highly specular surfaces using color volume and view-independent texture were heavily blurred and did not render specular effects. However, our view-dependent textures rendered both specular effects and texture details well.
Arbitrary view synthesis. We rendered images of test scene models at arbitrary viewpoints using our viewdependent texture synthesis method. Figure 8 shows the results of rendering 5 Â 5 arbitrary views while moving 1 cm in the XY direction from a location, not in the source viewpoints. The changes in the texture of the specular surfaces, such as a metal object and glass, according to the observation viewpoint, were smoothly represented. The synthesized arbitrary view videos are available in YouTube: https://youtu.be/wNN5e3ip3hQ and https:// youtu.be/fK51tZ-gO0U.
Compression usability. Similar to the experiment in Volino and others [29], we measured the performance change of source view reconstruction by the number of eigenvectors used to estimate the specular maps ( Figure 9). For each voxel block, our test setup produced 8 Â 8 Â 3 ¼ 192 eigenvectors from the matrix U X in (11). When the number of eigenvectors used to estimate the specular map was reduced to 32, approximately 1-dB PSNR loss occurred in the source view reconstruction. By properly reducing the dimension of the eigenvectors and specular weights, our method can be used for lossy texture compression or texture data reduction.   Waste-rate of texture space. The texture space wasted in conventional UV texture maps and our texture maps was measured, respectively. Conventional texture maps were produced using UV maps calculated by blender ( Figure 6A). Our texture maps were created, as mentioned in Section 4.3 ( Figure 6C). On average, 12.81% and 1.94% of the texture space were not used for the conventional UV texture maps and our proposed texture maps. Therefore, compared with a conventional UV map, our texture representation used texture space more efficiently.
Limitation. The synthesized views showed some distracting ghosting artifacts in the boundary regions of two triangles that were far apart in the 3D space, but adjacent in the image plane. These artifacts are due to the inaccurate geometry of the mesh extracted from TSDF [36]. This problem is solved by increasing the voxel resolution, which in turn increases the geometric accuracy of the mesh. However, the amount of texture data will significantly increase. This tradeoff can easily be resolved if a hierarchical TSDF structure [37] is used instead of a regular voxel grid structure.

| CONCLUSION
In this paper, we presented a view-dependent texture synthesis for a TSDF scene model using voxel-wise UV parameterization. The proposed voxel-wise UV parameterization assigns the precomputed UV map for each voxel using its MC configuration by referring to the UV map lookup table. Thus, no processing is required to calculate the UV map nor is storage required to record UV coordinates. Furthermore, we estimated textures for an arbitrary view simply and efficiently using the proposed voxel-wise UV parameterization, which showed visually satisfactory results. In the future, we will develop the proposed method to be an efficient representation for compression and immersive rendering of the real-world RGB-D dataset.