Temporal Upsampling of Point Cloud Sequences by Optimal Transport for Plant Growth Visualization

Plant growth visualization from a series of 3D scanner measurements is a challenging task. Time intervals between successive measurements are typically too large to allow a smooth animation of the growth process. Therefore, obtaining a smooth animation of the plant growth process requires a temporal upsampling of the point cloud sequence in order to obtain approximations of the intermediate states between successive measurements. Additionally, there are suddenly arising structural changes due to the occurrence of new plant parts such as new branches or leaves. We present a novel method that addresses these challenges via semantic segmentation and the generation of a segment hierarchy per scan, the matching of the hierarchical representations of successive scans and the segment‐wise computation of optimal transport. The transport problems' solutions yield the information required for a realistic temporal upsampling, which is generated in real time. Thereby, our method does not require shape templates, good correspondences or huge databases of examples. Newly grown and decayed parts of the plant are detected as unmatched segments and are handled by identifying corresponding bifurcation points and introducing virtual segments in the previous, respectively successive time step. Our method allows the generation of realistic upsampled growth animations with moderate computational effort.


Introduction
Modelling, visualizing and analysing plant growth is of great relevance in various domains including biology, agriculture and computer graphics with applications in movie and game productions. Therefore, the coupling of respective plant models to measurements of growing plants allows a better understanding regarding the underlying growth processes for individual plant species. While a conventional video recording has shown great potential for plant monitoring, its limitations regarding the flexibility of inspecting plants from arbitrary views and performing three-dimensional (3D) measurements can only be overcome by directly considering a complete 3D scan of the plant of interest. However, capturing the plant growth with all its details relies on a (ideally) temporally continuous scanning of the plant state which is hindered by the fact that capturing the plant state at a certain time step usually involves a time-consuming manual acquisition. This induces larger time intervals between successive measurements. As a plant often has grown significantly between successive scans, a smooth animation of the growth process from coarsely taken discrete measurements is not easily possible. Additionally, suddenly arising structural changes induced by bifurcation or the evolution of leaves have to be taken into account when aiming for a smooth completion of the intermediate states between successive measurements. Such topological changes of the underlying ground truth model and the fact that the measurement process inherently does not guarantee perfectly corresponding points or point numbers for successive point cloud scans cannot be handled by trajectory-based deformation and morphing approaches. These require exact point correspondences that can only be reliably obtained for small deformations and motions and, hence, are by design not suitable for analysing growth processes with changing topology [XSWL15]. Further limitations of previous work on deformations based on exact point correspondences regarding their applicability on growth processes include the assumption of piecewise rigid motion as used for object tracking [BIZ18], the requirement of a 3D object template that is deformed and fitted to the point clouds in adjacent time steps using respective priors (without enforcing temporal coherence) [ZFG*17], the involvement of a visual hull prior that biases the optimization in the context of mesh-based approaches [LLV*12] or the need for large databases required by learning-based methods [WLX*18] that are hard to acquire due to the time-consuming nature of the scanning process and the growth of the plants themselves.
In this paper, we address the aforementioned limitations by a novel method that allows temporal upsampling of coarsely acquired point cloud sequences of growth processes including topological changes without requiring exact point correspondences, shape priors or huge databases. For this purpose, our approach involves an initial point cloud segmentation and the generation of a hierarchy on the segments for each of the individual scans, the matching of the segment hierarchies of successive scans and the final computation of a segment-wise regularized optimal transport to morph the point sets associated with matched segments. Newly grown and decayed parts of the plant, that are detected as unmatched segments, are handled by identifying corresponding bifurcation points and introducing virtual segments in the previous, resp. successive time step. Furthermore, we present a heuristic segmentation approach that is particularly suitable for thin structures as occurring for maize plants or tree structures. By using a memory-efficient formulation of the optimal transport problem, our approach is also capable of handling large point clouds. As demonstrated in the evaluation, our approach yields smooth temporal upsamplings, allowing for realistic growth animations with moderate computational effort.

Related Work
Temporal upsampling for modelling and visualizing growth and deformation processes can be approached based on several strategies such as procedural modelling, template-based approaches, correspondence-based approaches and data-driven approaches. In the following, we will discuss respective developments in the context of plant growth simulation. In addition, we provide a review of applications of optimal transport between data distributions in the context of graphics applications as the use of optimal transport allows our method to overcome the need for perfect one-to-one correspondences or templates.
Procedural modelling for plant growth. Modelling structures and phenomena from sets of rules has received a lot of attention in the contexts of facade modelling and plant modelling. However, the underlying rules are often not directly known and have to be derived, for example, using inverse procedural modelling. In the context of modelling plant growth, L-systems have been extended by growth rules such as relational growth grammars [KKBS05] or tree-species-specific growth rules [LKMH05], component-wise time functions of logistic growth [CAJBS05], guiding vectors to provide local control over branch orientation during tree growth [XM15], generic rules to provide sequences of symbols and branching [SBM*10], the localization and tracking of topological events like budding and bifurcation [LPY*12], growth models [SPK*14] or plant modules for capturing speciesspecific branching pattern characteristics that can be combined to model whole individual plants [MHS*19]. While individual instances can be derived via the use of meaningful parameters and rules, a smooth interpolation between individual objects or process instances is challenging as smooth deformations of plants cannot simply be achieved by interpolating within the parameter space. In addition, inverse procedural growth models may fail to capture the nature of input examples that do not capture all of the relevant growth phenomena and may be limited to certain species.
Template-based and mesh-based interpolation. Early work on template-based growth analysis and visualization include the modelling of plant growth in terms of flexible spatio-temporal templates to represent non-rigid motions [LXZ*97]. Regarding mesh morphing, the morphing based on local transformations between source and target meshes such as As-Rigid-As-Possible Shape Interpolation [ACOL00] and affine transformations [SZGP05] is not capable of handling large deformations. The latter also applies to morphing based on intrinsic shape information such as differential coordinates [Ale03], linear rotation-invariant (LRI) coordinates [LSLCO05], Laplacian coordinates with invariance to rotation and isotropic scaling [SCOL*04], mean curvature flow Laplacian coordinates [HLW07], gradient fields [XZWB06] or the combination of near-rigid priors with a local gradient field interpolation [CL09]. Further work used spatio-temporal skeleton-based models [ZLLZ12]. Bansal etal. [BT18] introduced a mesh morphing and interpolation framework based on the Lie Bodies representation. For large deformations, the authors perform mesh segmentation and apply the Lie Bodies framework on the respective components. Furthermore, Chen etal. [CFB16] presented a method for the simultaneous morphing between two mesh sequences for blending motions and interpolating shapes. Zhao and Barbic [ZB13] segment plants from the geometric information to establish a structural hierarchy that is connected with an Finite Element Methods (FEM) simulation model to allow deformations. However, template-based approaches rely on the availability of 3D object templates that are deformed and fitted to the observations in adjacent time steps as well as adequate priors and larger deformations cannot be handledeasily.
Correspondence-based interpolation. Instead of exploring templates and shape priors given by the mesh representation, several approaches focused on unstructured representations such as point clouds, where the missing topological information complicates the morphing. This has been addressed based on improving the robustness of correspondences based on local mappings between the projections of the input point clouds onto a common parameter space which can be performed in a segment-wise manner for complex-shaped objects [XZPF04]. Alternatives include the guidance of the morphing by registration and warp functions [THCF06], exploiting differential geometry and continuum mechanics to get physically meaningful transitions [BGQ05] or the guidance of the interpolation based on gradient fields [TZZ09]. Wang et al. [WZH12] presented a method for morphing point-sampled geometry based on parameterizing the point clouds onto spheres and then comparing features. Solenthaler et al. [SZP07] refine irregularly sampled, dynamic points in the context of particle-based fluid simulation while preserving edges and avoiding point collisions. Further work on temporal upsampling by interpolating between temporally consecutive point cloud measurements includes the use of 4D spatio-temporal models as proposed in the context of large dynamic urban scenes [JBB13]. Lie et al. [LLV*12] incorporate the visual hull as a shape topology prior and surface bending energy for shape completion. Subsequently, inter-frame correspondences are used to obtain temporally coherent dynamics. The approach can handle larger deformations and certain topological changes, however, the topology of the output is tied to the possibly incorrect topology of the visual hull. Li et al. [LFM*13] exploit a forward-backward analysis for spatial and temporal analysis of 4D point clouds for tracking plant parts over time. The co-segmentation framework proposed by Yuan et al. [YLX*16] decomposes articulated point cloud sequences into near-rigid moving parts. Segments derived using trajectory analysis are progressively propagated to neighbouring frames and merged through all frames using space-time segment grouping. This results in robustness regarding noise, occlusions as well as variations of pose and view. Similarly, Bertholet et al. [BIZ18] address motion segmentation from RGB-D videos by representing motion as a sequence of rigid transformations through all input frames in an energy optimization framework. Vlachos et al. [VLSM18] focus on the improvement of animation reconstruction based on rank minimization while imposing spatial coherence between successive frame clusters. Furthermore, Xu et al. [XSWL15] approach trajectory-based shape interpolation by solving Poisson equations defined on a domain mesh. Interpolated shapes are reconstructed from interpolated gradient fields that exploit both point coordinates and surface orientations instead of directly using point coordinates. The technique shows stable area and volume changes and overcomes the shrinkage problem of linear shape interpolation. Zheng et al. [ZFG*17] used a template-based dynamic tracking algorithm for the temporal reconstruction of geometry and deformation in point cloud sequences, tailored to the reconstruction of flower petals. The use of specific priors to assist the tracking allows handling occlusions and collision-based interactions at the cost of requiring a suitable template. However, trajectory-based deformation/morphing approaches require exact point correspondences that can usually only be reliably obtained for small deformations/motions and, hence, are by design not suitable for analysing growth processes with changing topology due to newly added branches or leaves [XSWL15]. Furthermore, the points in adjacent point clouds may not exactly represent corresponding surface parts as the point distribution is determined by the scanning process. Our approach overcomes these limitations by considering the point distributions instead of the point cloud itself. As a result, we can circumvent the establishment of exact point correspondences that are hard to establish for growing objects with possibly changing topology.
Data-driven and learning-based interpolation. Data-driven approaches exploit the availability of large collections of related exemplars to describe the characteristic deviations between objects and, hence, to guide the interpolation between object states, which allows handling large deformations. This has been used in the context of mesh interpolation based on patch-based LRI coordinates [BVGP09], using an iterative Gauss-Newton method for shape interpolation [FB11], solving a shortest path problem in the local linear subspaces derived from a given model database [GLHH13], using low-dimensional shape spaces that exploit the specific structure of the interpolation problem [VTSSH15] and using rotation/translation-invariant representations that define plausible deformations in a global continuous space [GCLX17]. Furthermore, the approach by Wang et al. [WLX*18] explores a shape space technique that relies on the correspondence computation between trees of different structure and using geodesics to obtain the continuous blending between trees. However, such collections of examples that represent the complete range of possible variations are often not available due to the impractically time-consuming capture processes.
Optimal transport in graphics and vision. Optimal transport has been used successfully in various computer graphics and computer vision applications. Early applications include its use for the measurement of the similarity of images by defining the earth mover's distance between colour distributions and between texture feature distributions as used by Rubner et al. [RTG98,RTG00]. Furthermore, optimal transport has been successfully used for colour transfer in 2D images based on the regularization with the preservation of the gradient of the original picture [PKD07], an approximate variational formulation using an approximate Wasserstein constraint on colour statistics and a generic geometric-based regularization [RP11], regularized discrete optimal transport within a unified convex variational framework [FPPA14], relaxed optimal transport using a regularization that considers the spatial distribution of colours [RFP14], by using separate illuminant matching and optimal transfer of dominant colours regularized by thin plate splines [FSDH14] and extending standard entropy regularization based optimal transport to unbalanced problems [CPSV16].
In addition, optimal transport has been applied for interpolation. Respective examples include the interpolation between textures [RPDB11], reflectance models such as bidirectional reflectance distribution functions (BRDFs) [BVDPPH11, SDGP*15] or bidirectional texture functions (BTFs) [GK18] based on statistical representations [GK17] as well as geometric models in terms of volumetric representations [SDGP*15]. Digne et al. [DCSA*14] used optimal transport for surface reconstruction and simplification from point clouds and Merigot et al. [MMT18] investigated optimal transport between a simplex group and a point cloud.
Plant segmentation. The numerous approaches for plant segmentation include the combination of colour information and 3D models [ADT11, MBW*18], adapted surface feature-based techniques [PDMK13], model-based approaches based on cylinder representations of stems and separate segmentation tailored to leaves [GDHB17], skeleton-based stem and leaf point recognition approaches 2019, facet region growing approaches after an initial oversegmentation [LCT*18] as well as automated, data-driven approaches for plant structure segmentation based on geometric features and Random Forests classifiers (e.g. [DNC*18]) or geometric features and clustering 2015.

Temporal Upsampling of Point Cloud Sequences Using Optimal Transport for Plant Growth Visualization
Given a time series of point cloud scans obtained by a 3D scanner, our goal is to produce a temporal upsampling of the plant growth that happened between the individual measurements that results in a realistic, smooth plant growth visualization. As illustrated in Figure 2, our method consists of the following steps: Initially, the scans are coarsely aligned to each other (Section 3.1). While it is possible to set up and solve a single optimal transport problem for the whole dataset, the result is not satisfactory, as artefacts like unnatural-looking breaking up of plant parts occur (see Figure 3). In order to alleviate this, we perform the following steps: First, a hierarchical segmentation is derived for each scan individually (Section 3.2). Afterward, the corresponding segments of successive scans are matched (Section 3.3), which is facilitated by the initial coarse alignment. Finally, for each pair of corresponding segments, a regularized optimal transport problem is set up (Section 3.4). Its solution delivers the information required for a temporal upsampling, which is obtained by interpolation.

Coarse alignment
Continuously monitoring plant growth with a fixed arrangement of scanners is complicated due to self-occlusions induced by growing plant parts and results in incomplete 3D scans. As a consequence, detailed plant scans require a manual scanning process, ideally with a movable scanner platform that can be guided by a user to obtain complete scans. As this may require the relocation of plants to a certain scanner platform, successive scans have to be aligned in a common coordinate system in order to register the individual plant segments and to allow the consideration of changes induced by the growth process. For this purpose, we align each scan in the time series coarsely via the iterative closest point algorithm [BM92] to its predecessor. We assume that scans in a time series have a consistent scaling.

Hierarchical segmentation
If used without any further constraints, the later described OT solution will lead to an unrealistic breaking up of the dataset in the upsampling as shown in Figure 3. To avoid this problem, we segment the respective point cloud P i of each time step t i into segments s ∈ S P i , onto which we restrict the optimal transport problem setup. These segments are not necessarily matched. Additionally, new segments might appear due to growth or might disappear due to cropping. In order to be able to better match the segments and handle growth and decay events, we introduce a hierarchy H P i on the segments S P i of each point cloud P i .

Creating a hierarchy on existing segmentations
In principle, different segmentation techniques may be used to derive the segmentation S P i . However, they should be suitable for the expected plant structures. For a given segmentation of a plant into its semantic entities like stem, branches, leaves or blossom petals, we first identify the root segment of the hierarchy H P i as the segment containing the point with the lowest vertical coordinate. Alternatively, the respective segment may also be directly chosen by the user.
Then, successive nearest neighbour searches are performed to populate the hierarchy H P i . All segments containing points that are closer than a user-specified threshold to points of the currently considered segment are added as children of this segment in the hierarchy. This is iterated until all segments have been processed. An exemplary decomposition into a segment hierarchy derived for a synthetic tree model is illustrated in Figure 4. Here, the ground truth segmentation of the tree into the stem and individual branches was directly obtained from the synthesis and we chose the trunk as the root segment, which is also the segment with the lowest vertical coordinate. In the case of a given segmentation with no meaningful root segments, such as in the Komatsuna dataset [USM*17], we introduce a virtual, empty root segment and add the other segments as direct successors in the hierarchy H P i . An example is shown in Figure 5.

Simultaneous segmentation and hierarchy generation for thin structures
For plants with thin structures, we can derive the segment hierarchy H P i together with the corresponding segmentation S P i without the requirement of an initial segmentation of a plant into its parts.
First, we build a k-nearest neighbour graph on the point cloud, where k is a user-defined parameter. We found k = 15 to deliver good results in our experiments. On this graph, we compute a minimum spanning tree (MST). We assume the point with the smallest value along the vertical axis to be the root point of the MST, thus inducing a direction on the graph. When operating on single plants, this is a reasonable assumption as this point corresponds to the location where the stem exits the ground.
For each node v in the MST, we compute the longest possible path length m v to a leaf node in the MST subtree rooted at v. Starting with the root, we traverse the MST. If we encounter a node v in the tree where the number of successors u ∈ succ(v) with longest path length m u > c is more than one, for a user-defined threshold c, we found a bifurcation. All of v's successor nodes u with m u > c are then considered to be roots of trees corresponding to new biological branches. The edges from v to these successors are inserted into a set E of edges to be removed from the MST, which also define the connectivity between created segments. After visiting all nodes, the edges e ∈ E are removed from the MST and the connected components of the resulting graph, respectively the corresponding 3D points, define the segmentation S P i on the point cloud P i . At the same time, a separate graph representing the hierarchy H P i of the segments in S P i is created. The root segment is chosen such that it contains the MST root node. Two segments s 1 , s 2 ∈ S P i are connected in H P i if an edge (u, v) ∈ E exists with u ∈ s 1 and v ∈ s 2 . An example of the resulting segmentation and the corresponding hierarchy graph is depicted in Figure 6.

Segment matching
Given segmentations of two point clouds P and Q for successive time steps, we match the segments of P and Q using the hierarchies H P and H Q built in the previous step. We differentiate the following cases: 1. The labels are consistent and the number of segments is identical. This is usually the case with manually segmented data, when no new segments have grown. An example is shown in Figure 5(a). 2. The labels are consistent and the number of segments varies. This is usually the case with manually segmented data, when new structures represented by new segments have grown or parts have decayed. An example is shown in Figure 5(b). For growth scenarios, small new virtual segments are added to the first point cloud P and matched to the otherwise unmatched segments of the second point cloud Q. These virtual segments are used to generate a growth animation. For decay scenarios, we add a point set that represents the decayed leaf. As a heuristic, we generate this point set by moving the segment's points towards the smallest vertical coordinate, which is perceived as the floor. Alternatively, this segment may be simply faded out. We emphasize that the respective choice depends on the preferences of the user. 3. The labels are inconsistent. This can occur if the labelling has been generated automatically with no additional information. In this case, we perform a matching algorithm, which is described in Section 3.3.1. Growth or decay may also occur and is handled analogously to the scenario with a consistent labelling.

Segment matching with inconsistent labelling
If different IDs are assigned to corresponding segments s P , s Q in consecutive scans, we use the Wasserstein distance d w (s P , s Q ) (see Equation (1)) as a similarity measure between the two segments s P , s Q , that is, the cost of warping the point distributions onto each other. Initially, only the root segments of the hierarchy are considered to be matched. We then alternate between merging and matching segments in the hierarchy, which is explained in detail in the following paragraphs.
Merging. For each newly created match of the segments (s P ∈ H P , s Q ∈ H Q ), we try to merge the segment in the second point cloud s Q with one of its children c Q ∈ succ(s Q ), as long as this reduces Figure 2: Overview of our method: First, a hierarchical segmentation is generated for each scan, which is followed by a matching of the segments of point clouds obtained for successive measurements at the time steps t i and t i+1 and the computation of segment-wise optimal transport solutions used for temporal upsampling.
the Wasserstein distance d w (s P , s Q ) between the associated point clouds. The rationale behind this step is to adapt to different segment hierarchies induced by growth or decay processes. For example, a new biological branch might have appeared in the later states of the plant growth, which leads to the splitting of the segment of the previous scan at the location where the new part starts growing. This process is repeated until no further improvement is possible.
Matching. When no further merging steps are possible, we create a matching of the remaining child segments. Since the scans are segmented individually, the association of corresponding parts between the segmentations has still to be computed in order to be able to set up an optimal transport problem between segment pairs. An  example of the segmentations and the associated graphs of two consecutive scans is shown in Figure 6.
In order to resolve this ambiguity, we perform the following matching step: We compute the pairwise Wasserstein distances d w (c P , c Q ) between the point clouds associated with the child segments c P ∈ succ(s P ) and c Q ∈ succ(s Q ). The pairwise costs are stored in a cost matrix. The optimal association of the child segments is computed with optimal transport. Segment growth and cropping/decay. A plant's growth may yield new parts like new branches or leaves. Analogously, there may also be a loss of parts such as the falling of leaves or the breaking of branches. This is reflected in the segment hierarchy. If the current target segment s Q has more children than s P , a new part has grown. If s Q has fewer segments, plant parts have been cropped or decayed and fallen off the plant. In order to handle these cases, we create virtual matching segments. The child segment with the largest Wasserstein distance to any child segment of the other point cloud is considered unmatched and a virtual segment is generated in the other scan.
For growth scenarios, the point cloud associated with the virtual segment is generated by searching l points in the point cloud of s P , that are closest to the points of the unmatched segment of point cloud Q. l has to be chosen by the user, depending on the density of the point cloud. In our experiments, we found values between 10 and 100 to deliver good results. These points are duplicated, labelled as a new segment and added to the segment hierarchy. The newly generated virtual segment is matched with the previously unmatched segment. This procedure is repeated for all unmatched segments.
For cropping or decaying scenarios, our goal is to generate an animation of the cropped part falling to the ground. Note that other solutions like the fading out of the respective points or their immediate removal are also viable and can be generated analogously. We duplicate the points of the cropped segment, but set their vertical axis coordinate to the point cloud's minimum. These points are  added as a new segment to Q, which is then matched to the previously unmatched segment of P.

Optimal Transport plan for two point clouds
So far, we have outlined how point clouds are segmented and how the segments are associated between different time steps. Based on this information, we perform temporal upsampling by warping, respectively, matching the point distributions of previously matched segments. Let P and Q be their respective point clouds, which we want to match onto each other using optimal transport. Solving the optimal transport problem for given point clouds P and Q involves the computation of the Wasserstein distanced w (P, Q), which is essentially the cost of the optimal transport solution, that is, the cost of warping one distribution onto the other, and the temporal upsampling is performed by using the solution for the interpolation of the point clouds.
In the (discretized) optimal transport problem, each point p a ∈ P, 1 ≤ a ≤ |P| and q b ∈ Q, 1 ≤ b ≤ |Q| is assigned a mass m a and m b , respectively, such that 1≤a≤|P| m a = 1 = 1≤b≤|Q| m b . The goal is to find an optimal transport plan, which transports all mass from P to Q, minimizing the cost, which is given by a cost matrix C ∈ R |P|×|Q| , with entries C ab defining the cost of transporting a piece of mass from p a to q b , where one common choice is given by the euclidean distance. The optimal solution is a transport matrix T ∈ R |P|×|Q| with entries T ab denoting the amount of mass transported from p a ∈ P to q b ∈ Q, chosen such that the total transport cost is minimized. The optimal transport problem can be written as a linear program: which minimizes the (discretized) squared 2-Wasserstein distance d w (P, Q) between the point clouds P and Q. For small point clouds, this problem can be solved by standard algorithms.
Transport problem setup. In order to compute the optimal transport from point cloud P to Q, we have to define the point masses m a and m b , and the entries C ab of the cost matrix C. Since we do not have any particular preference for certain points, we define their masses uniformly as m a = 1/|P| and m b = 1/|Q| for each point p a ∈ P and q b ∈ Q, respectively.
For each point p a ∈ P, we collect the k nearest neighbours q b ∈ Q. This graph is represented by a sparse matrix C P , where entry C P,ab = d(p a , q b ) is the distance between p a and q b if q b is one of the nearest neighbours. If q b is not among the k nearest neighbours of p a , we define C P,ab = ∞. The matrix C P is the cost matrix of transport from the source cloud's points to the target cloud's points. This is a reasonable approximation to the true dense cost matrix, since points in Q that are far away from a point p a are unlikely to be used as targets for transporting mass from p a in the solution of the full optimal transport problem. The cost for transporting from p a to non-neighbour points is thus infinite.
It is possible that some points of the target point cloud are not among the nearest neighbours. Since we want to compute a solution to the optimal transport problem that transports from each point and to each point, this is not desirable. In order to alleviate this, we also compute the k nearest neighbours p a ∈ P for points q b ∈ Q. This is also represented by a sparse matrix C Q , where entry C Q,ab = d(p a , q b ) represents the distance between p a and q b if p a is among the nearest neighbours of q b . The resulting cost matrix C is then defined as the coefficient-wise minimum of C P and C Q . k determines the sparsity of the matrix and is user-defined. We found k = 15 to deliver good results in our experiments. Thus, in the resulting graph, there are at least k connections from each point in the source cloud to points in the target cloud and also at least k connections from each point in the target cloud to points in the source cloud.
The Sinkhorn algorithm for regularized optimal transport. The optimal transport problem is approximated by the regularized optimal transport problem, which we solve using the Sinkhorn algorithm [Cut13]. In the regularized transport problem, the objective function in Equation (1) is modified to where H(T ) is the entropy of the matrix T, defined as T ab log T ab . (3)

Inserting Equation (3) into Equation (2) yields
It can be shown [Cut13] that minimizing Equation (4) leads to the following condition for the elements of T: where K ab = exp(−C ab /γ ) are the entries of a matrix K and u = (u a ) a , v = (v b ) b are unknown vectors. K is also a sparse matrix and the infinite distances in C become 0 in K. This property is important in order to be able to handle large point clouds. A full cost matrix for large point clouds would not fit into the computer's memory.
The Sinkhorn algorithm then finds an approximate solution by iterating the following steps: where denotes an element-wise division. The elements of the approximate solution matrix T are then T ab := u a K ab v b .

Using the solution of the transport problem for interpolation.
The solution of the transport problem is a transport matrix T of the same size as C, where each entry T ab denotes the amount of mass that has to be transported from p a to q b . Since we are interested in an interpolation animation that is visually appealing, we set up the following requirements: • No point can disappear, that is, each point of P has to move to at least one point of Q. • No point can appear out of nowhere, that is, for each point of Q there must be at least one point of P moving to it. • For efficiency, as few as possible point pairs should be used for the transport.
A solution to the first two requirements is to use each entry T ab > 0 of the transport matrix and interpolate p a to q b . If there are several T ab > 0 for fixed a, we create copies of p a , which follow different animation paths to their respective q b . Since the cost matrix C allows up to k targets for each p a , the number of transported points is at most k · (|P| + |Q|) instead of |P| · |Q|. Yet, this may be an unnecessarily huge amount of transported points, which is inefficient for real-time animations as the number of point copies may still become very big. In order to alleviate this, we perform the following optimization: We create a matrix T , which contains a 1 only at the maximum entry of each row of T at their respective position. All other entries are set to 0. We create another matrix T , which is 1 only at the column maxima of T and 0 otherwise. We then generate a matrixT = 1 2 (T + T ) which is used for the transport instead of T. Figure 7: Several stages of a growing maize plant, visualized with our method. Real measurements are highlighted with a grey background, while temporally upsampled data generated with our method are shown with a white background. Figure 8: Several stages of a growing maize plant. Images with a grey background are input data. Images with a white background were generated with our method. Between the two successive scans, the plant has grown regarding its size as well as a newly evolved leaf. The budding process was generated automatically. In the upper row, the points were colourized according to their label. Note that the labels and their respective colours in the input data (grey background) do not match, because the input scans are segmented individually. Our matching method automatically produces correct assignments. The label colour of matched segments is interpolated over time (white background) in order to visualize this.
Having this transport plan, we can visualize the result as an animation, where each point p i and its copies are transported linearly to the q j 's positions in real time.

Results
In this section, we show results obtained with our method. All experiments were performed on a PC with an Intel Core i7-8700K CPU and 64 GB RAM. We implemented our approach in Python. An overview of the computation times, separated into segmentation (where applicable) and segment matching and solving of the optimal transport problem is given in Table 1. The upsampling using the optimal transport solution and the visualization run in real time, thus no computation times are given for this step.
Maize plant time series. We generated a smooth animation from eight scans of a maize plant. These were acquired with a hand-held laser scanner, in a fashion comparable to the one described by Paulus et al. [PSKL14]. Exemplary input scans and upsampled data are shown in Figure 7 and a full animation is shown in the accompanying video. Figure 8 shows two input scans from this dataset and their upsampling in more detail. The segmentation of these scans is shown by the colouring of the plant. The labels and their respective colours do not match in the input scans, because the point clouds were segmented individually. Our matching method automatically assigns the correct segments to each other. This is visualized by interpolating the segment colour in the upsampled data. The scans in this dataset have between 1700 and 5000 points, which took 12.1 s. The point cloud matching and computation of optimal transport took 50.0 s. The total computation time was 62.2 s. The visualization of the interpolation runs in real time.
We also computed an animation without computing a segmentation, using the described interpolation method on the whole point clouds, as shown in Figure 3. Here, parts of one leaf get transported to a different leaf, because it exhibits a stronger growth than before.  Employing the segmentation prohibits this artefact because the transport is restricted to take place only between matched segments.
Cropped maize. In order to test the handling of cropped plant segments, we manually removed a leaf of the maize plant. As a first test, we used the cropped plant as the source and the complete plant as the target. The result is an animation of a growing leaf, shown in Figure 9. For the second test, we used the full plant as the source and the cropped plant as the target. Our method is able to generate an animation of the cropped leaf falling to the ground, shown in Figure 10 and the accompanying video. The full plant had 5080 points, the cropped one 4100. The total time required for segmentation was 6 s. The time for computing the upsampling was 3 s in both cases.

Synthetic Tree 1 time series.
We generated a dataset of four virtual scans of a simulated growing tree with 778 to 2253 points. The segmentation took 4.2 s. Matching and optimal transport took 42.0 s. Our visualization runs in real-time. The dataset and its upsampling are shown in Figure 11 and the accompanying video.
Synthetic Tree 2 dataset. We generated a synthetic tree dataset consisting of two point clouds, representing tree states for different times with a considerable growth in-between including the growth of many new branches, where the first has 77,448 points and the sec-ond one 326,032 points. A huge amount of growth with many new branches occurred. The segmentation was given due to the synthesis, as shown in Figure 4. We created a hierarchy on the segments as described in Section 3.2.1. The computation of the matching and optimal transport plan took 88 s. Our algorithm was able to generate a smooth and realistic animation, which is depicted in Figure 1 and shown in the accompanying video.
Komatsuna dataset for instance segmentation, tracking and reconstruction. The Komatsuna RGB-D dataset described by Uchiyama et al. [USM*17] consists of a time series of RGB and depth images of five plants acquired with an RGB-D camera, as well as a manually created labelling of the plant parts. All plants were recorded six times a day over a period of 10 days. For our experiments, we used plant 0 and constructed coloured 3D point clouds from the RGB-D images for each plant part. Each label in the dataset defines a segment in our context and is interpolated separately. In order to obtain a hierarchical representation, we create an empty virtual root node and added child nodes representing the plant's leaves.
Due to noise in the original data, the depth-discontinuities at leaf boundaries are not precisely aligned with the RGB images and label data, and the depth of thin branches is not well-captured. Therefore, we apply a semi-automatic pre-processing step before creating the 3D point clouds: First, we remove all depth values which we consider to be incorrect in a manual process and mark them as 'unknown', which usually occurs at the boundaries of leaves. In a second step, we fill in the 'unknown' depth values based on the known valid depth values, similar to the approach of Desbrun et al. [DMSB99]. This is carried out for each label of each plant separately.
This dataset shows partial occlusions and in some cases slightly misplaced manual labels. In Figure 12 and in the accompanying video, it can be seen, that, while the general animation is smooth, these two aspects lead to some visual artefacts. The temporal occlusion leads to leaves seemingly shrinking away from the centre of the plant and growing back to the centre shortly later. The misplaced labelling leads to some points moving between leaves. Both are limitations of our method and may be addressed either in preprocessing or in future work. The computation time was 105 s.
As an additional experiment, we used only each 10-th scan of the Komatsuna plant 0 dataset. The resulting upsampling looks convincing and shows fewer visual artefacts since by skipping input data, we essentially removed high-frequency temporal noise as also demonstrated in the accompanying video. The computation time was 10 s.

Verification: Reconstruction of the Komatsuna dataset by skipping original data.
To analyse the quality of the upsampling, we omit every second frame of the original dataset and compare it with a reconstruction generated with our method. As an example, we show scan 3 of day 9 of plant 0 of the dataset. The whole dataset was rescaled to fit into the [0, 1] 3 unit volume. Comparing the reconstruction to the original data resulted in a mean distance of 0.001728 unit lengths, a standard deviation of 0.001130 unit lengths and a maximum distance of 0.00787517 unit lengths. A visual comparison is shown in Figure 13.

Limitations and Future Work
The segmentation algorithm described in Section 3.2.2 is designed for thin structures as occurring in the plant data considered, whereas the matching and optimal transport components of our method do not have such limitations. In the future, we intend to investigate the combination of our method with other segmentation techniques, such as PointNet [QSMG17]. Furthermore, our method is tailored to data that can be represented as a hierarchy of segments, and the results of our method depend on the quality of the input data. If the input data seemingly move back and forward, as in the Komatsuna dataset, this is also mirrored in our upsampling. The plant movement can be due to the natural movement of the plant or due to scanner noise. Varying sampling in the input data can also lead to counterintuitive movement of the individual points in the upsampled dataset in some cases. This can either be resolved by resampling the data in a pre-processing step or by introducing additional constraints to the optimal transport problem, which would be an interesting direction for future work. Noisy points are not smoothed out by the method itself. In this case, we rely on pre-processing. Furthermore, our method does not fill in temporarily occluded regions of the point cloud data. This can be resolved by preprocessing the data with existing methods like 3D shape inpainting.
Our method matches the segments traversing the hierarchy in a greedy fashion. While this strongly reduces computational complexity, it can also lead to errors that could be prevented by comparing all possible successor combinations. Segments that were pruned completely can be recognized and handled in a user-defined fashion. Partially pruned segments are currently not recognized and thus result in a shrinking animation. In the future, we would like to explore possibilities for handling these cases, for example, by comparing the volume and only allowing volume increases for the animation. Volume decreases could be either faded out or filled.

Conclusion
We presented a novel, efficient approach for the temporal upsampling of point cloud sequences that, by design, circumvents limitations of previous approaches that rely on exact point correspondences, shape priors or the availability of huge datasets and allows handling topological changes to be expected for several growth processes. For this purpose, our approach involves an initial automatic alignment of successive scans, the generation of a segment hierarchy for each individual scan, the matching of the segment hierarchies obtained for successive scans and the final computation of segment-wise regularized optimal transport in a memory-efficient way that allows morphing the distributions of the point sets associated with matched segments onto each other. As demonstrated by our evaluation, our approach allows the generation of a smooth temporal upsampling of the input point cloud sequence and, hence, a realistic growth animation. We believe that our method will be useful for biologists and researchers in the field of agriculture, as well as in the entertainment industry.