Extracting individual trees from lidar point clouds using treeseg

Recent studies have demonstrated the potential of lidar‐derived methods in plant ecology and forestry. One limitation to these methods is accessing the information content of point clouds, from which tree‐scale metrics can be retrieved. This is currently undertaken through laborious and time‐consuming manual segmentation of tree‐level point clouds from larger‐area point clouds, an effort that is impracticable across thousands of stems. Here, we present treeseg, an open‐source software to automate this task. This method utilises generic point cloud processing techniques including Euclidean clustering, principal component analysis, region‐based segmentation, shape fitting and connectivity testing. This data‐driven approach uses few a priori assumptions of tree architecture, and transferability across lidar instruments is constrained only by data quality requirements. We demonstrate the treeseg algorithm here on data acquired from both a structurally simple open forest and a complex tropical forest. Across these data, we successfully automatically extract 96% and 70% of trees, respectively, with the remainder requiring some straightforward manual segmentation. treeseg allows ready and quick access to tree‐scale information contained in lidar point clouds. treeseg should help contribute to more wide‐scale uptake of lidar‐derived methods to applications ranging from the estimation of carbon stocks through to descriptions of plant form and function.

Limiting the adoption of these new lidar-derived methods is the difficulty associated with accessing the information content of point clouds. To retrieve tree-scale metrics from these data, individual tree-level point clouds must be extracted from the larger-area point cloud. This generally involves laborious and time-consuming manual segmentation, a task that is impracticable for many trees or plots.
The more complex the scene, the more difficult and time-consuming this becomes.
Several methods have looked to automate this process using a variety of techniques; the approaches of Raumonen et al. (2015) and Trochta, Krůček, Vrška and Král (2017) organised the larger-area point clouds into clusters, from which tree-level point clouds were grown through fixed inter-cluster assumptions of distance and orientation to infer connectivity. A variant of this approach by Zhong et al. (2017) employed graph theory for this organisation, while the methods of Tao et al. (2015) introduced ecological theory via metabolic scaling theory (West, Brown & Enquist, 1997), to determine belonging on a point-by-point basis. Limiting these methods were their exclusive application to TLS data from structurally simple and sparse forest types, with large distances and minimal interaction between crowns, such that transferability of these methods and their underlying assumptions of tree architecture and/or data quality to more complex forest types, is likely difficult.
Here, we present treeseg, an open-source software for the near-automatic extraction of tree-level point clouds from larger-area point clouds. The method has been designed around the principles of being both independent of forest type and instrument, and is demonstrated here through application to lidar data acquired from both simple open forest and complex tropical forest. The method is driven by the data themselves, free from fixed assumptions of tree architecture and data quality, and unique in its attempt to organise the larger-area point cloud into each separate underlying surface comprising the scene.

| OVERVIE W OF TREE SEG
The treeseg method has been developed to near-automatically extract tree-level point clouds from larger-area point clouds.
The source code, released under the MIT license, is hosted at https://github.com/apburt/treeseg/. The method is implemented in C++, and makes extensive use of the Point Cloud Library (Rusu & Cousins, 2011). Figure 1 illustrates treeseg, where it has been applied to a small section of TLS data acquired in tropical forest. Broadly, the method consists of: (i) identification of individual stems, (ii) segmentation of each stem up to first branching, and (iii) isolation of each crown from the canopy. Here, first, we outline the generic point cloud processing techniques underpinning treeseg; and the subsequent sections then describe these three main steps of the method.

| P OINT CLOUD PRO CE SS ING
This section describes the generic point cloud processing techniques used in the treeseg method.

| Nearest neighbour distance
The Euclidean distance, d, between two points, p 1 and p 2 , denoted in the usual Cartesian coordinate system, is defined as: The distance between the point p 1 and its nearest neighbour, d NN (p 1 ), in the point cloud, P, p i ∈ P, is defined as: Across P, the mean nearest neighbour distance, d NN , is defined as:

| Downsampling
Point cloud downsampling can manage the distribution of d NN (z) across P, or reduce computational complexity. This is typically undertaken by partitioning the bounding volume of P into voxels with an edge-length of l V . The points inside each voxel, p i ∈ V, are aggregated into a single point, p c .

| Euclidean clustering
Clustering is a tool used to order an unorganised point cloud (e.g., P) into an organised set of point clouds, {C}, {C} ⊂ P, based on Euclidean distances. The cluster C is formed when enough constituent points (≥ N min ) have an intra-d NN less than or equal to d max . p i ∈ C are then segmented from P and the set is added to {C}, and iterated until remaining p i ∈ P are exhausted.

| Principal component analysis
Principal component analysis is used to describe the arrangement of points in a point cloud (e.g., p i ∈ P) through eigen decomposition: where, λ and v are the eigenvectors and eigenvalues respectively; Σ, the covariance matrix, is defined as: where, for example, cov(x,y), is defined as: Obtained from the resultant Σ, the cubic polynomial can be solved for λ through factorisation, from which v can be solved from each respective resultant linear equation. The principal axes of P are then described by λ (eigenvectors), alongside their associated magnitudes v (eigenvalues).

| Surface normals
One extension to PCA is surface normal estimation. Across an organised point cloud, C, representing a common underlying surface such as a leaf, the normal to this surface is the vector perpendicular to a plane fitted through C. By definition, 1 across C, derived via PCA, is the vector of this plane. As each eigenvector is orthogonal to the next, 3 defines the normal of this plane (i.e., an estimate of the surface normal). Across unorganised point clouds, each point is attributed a surface normal estimate by fitting such a plane to either the nearest N neighbours, or those neighbours whose distance is less than d max .

| Region-based segmentation
A further extension to PCA and surface normal estimation is region growing segmentation. This partitions P into a set of regional point clouds, {R}, {R} ⊂ P, based on neighbourhood point commonality, such that it is inferred p i ∈ R share some common underlying surface.
Here, the metric used to determine commonality between points is the angle between surface normals: Vertically resolved nearest neighbour distance across the NOU-11 point cloud both pre-and post-downsampling. The dashed line represents the assumed pulse footprint at the respective height (i.e., assuming no lateral travel) where A and B represent 3 of two arbitrary points. Each region cloud, R, is grown from a random seed point in P by incorporating neighbours provided the angle between surface normals is below some threshold, max , until exhaustion of the neighbourhood. R is then segmented from P and added to {R}, and the process is iterated until all seeds are considered.

| Shape fitting
The final method used in treeseg is shape fitting via random consensus model fitting (RANSAC). The only geometric primitive considered here is a cylinder, defined by: a centreline with the vector, v, a point on the centreline, p 0 , and a radius, r. If R is an organised point cloud whose underlying surface belongs to a cylindrical section of stem, v can be equated to 1 . That is, a plane described as: where p 0 can be described as the centroid point, p c , of R. Subsequently, r is defined as the mean distance between p ∈ R and the plane. This shortest distance between some arbitrarily selected point, p 1 , and the plane, d, is defined as: For the cylinder to be finite, then the length, l, must be characterised. To determine l from R using Euclidean distance, it is necessary to transform 1 such that the centreline z-axis will be rotated into (0,0,1): where θ, the angle between 1 and (0,0,1) is calculated through Equation 8. This reduces the estimate of l down to: This cylinder fitting can then be applied in a RANSAC framework, a brute force approach permitting the removal of outliers from noisy data.

| THE L ARG ER-ARE A P OINT CLOUD
treeseg ingests a point cloud (with each point attributed an x, y, zcoordinate) such as the one illustrated in Figure 1. The two expectations of these data are: first, the point cloud provides a contiguous sample of the scene from which structural elements of individual trees, down to the highest-order branching, are clearly discernible.
That is, the sampling protocol employed to acquire the larger-area point should minimise occlusion effects, with no significant gaps Identifying stems in the NOU-11 larger-area point cloud: (i) a slice in the z-axis is segmented from the plot-level point cloud, as driven by the underlying DTM, (ii) the slice is organised via Euclidean clustering, (iii) each of these clusters are further organised into their underlying surfaces via region-based segmentation, and (iv) stems are identified from each region through RANSAC cylinder fitting present in the data. Second, for data requiring co-registration (e.g., multiple scans from various locations using a tripod-mounted TLS), error in this registration should be similar to the ranging accuracy of the instrument.
Due to beam divergence, increasing angular distance between sequentially fired pulses at a given range, and general occlusion, there is likely to be strong variation in d NN  values across the height has reduced by an order of magnitude.

| TREE IDENTIFI C ATI ON
The initial stage of treeseg is to identify individual trees from the larger-area point cloud, P. Illustrated in Figure 3, this stage comprises: 1. The generation of a digital terrain model (DTM) across P, from which a point cloud slice, S, in the z-axis, is generated from P ( Figure 3i).

| S TEM S EG MENTATI ON
The second stage of treeseg is, for each identified tree, to extract the stem up to the position of first branching. The approach, consisting of the four following steps, is illustrated in Figure 4: The cylinder belonging to each identified tree acts as a passthrough filter across the larger-area point cloud to extract the point cloud, V, V ⊂ P, containing: the stem of the tree in question, neighbouring vegetation, and returns from the ground ( Figure 4i).

2.
A plane is fitted through the lower section of V via the RANSAC framework, with the inliers, considered to represent the ground, segmented from V (Figure 4ii).

Euclidean clustering and region-based segmentation is applied
across V to order the cloud into its underlying surfaces, from which neighbouring vegetation is removed (Figure 4,iii,iv,v).
F I G U R E 4 Segmenting the stem: (i) the cylinder fit for each identified tree (red) acts as a pass-through filter (black), (ii) points from the ground are removed through RANSAC plane fitting, (iii), (iv), (v) Euclidean clustering and regionbased segmentation are used to remove neighbouring vegetation, and (vi) RANSAC cylinder fitting is used to determine the location of first branching 4. From the base of the stem, variation in the goodness of RANSAC cylinder fits through the stem are assessed to determine the position of first branching (Figure 4vi).

| CROWN S EG MENTATI ON
The final stage of treeseg is to extract the crown associated with each segmented stem, as illustrated in Figure 5.
1. The volume of canopy containing the crown of each stem is determined using allometric relationships relating stem diameter to tree height and crown extent. These dimensions, further extended by a user-modifiable fixed distance to ensure complete capture of the crown, are used to segment the point cloud, V, V ⊂ P (Figure 5i).

2.
Region-based segmentation is used to order V into the set of point clouds {R}, {R} ⊂ V, based on common underlying surfaces ( Figure   5ii).

3.
Commencing from the isolated stem, the crown is grown through appendage of R i via connectivity testing, provided that: (i) the distance between each R is approaching d NN (z) and (ii) that R is smaller than its parent ( Figure 5iii); where the length of R is determined through transformation about the principal eigenvector.

4.
Finally, due to the conservative nature of 3) tending towards commission error, if necessary, manual segmentation is used to remove any erroneously segmented neighbouring vegetation ( Figure 5iii). This can be undertaken using software such as CloudCompare (2018).

| CON CLUS IONS
In this applications paper, we have presented a method for the nearautomatic extraction of tree-level point clouds from larger-area point clouds. We have applied the method to the KARA-001 and NOU-11 downsampled point clouds of 17 million points (197 MB) and 338 million points (3:8 GB), to segment 28 and 155 trees with DBH greater than 0:2 m (this is an arbitrary user-defined threshold, and not a limitation of the method). Automatic segmentation took 2 days and 1 week respectively using a 24-core 2.  Chave and Blaise Tymen for help with this.

AUTH O R S' CO NTR I B UTI O N S
All authors conceived and designed the methods. A.B. wrote the software and manuscript, with contributions from M.D. and K.C.

DATA ACCE SS I B I LIT Y
treeseg is available at https://github.com/apburt/treeseg. The version described in this paper, v0.