Detection of Structural Components in Point Clouds of Existing RC Bridges

The cost and effort of modeling existing bridges from point clouds currently outweighs the perceived benefits of the resulting model. There is a pressing need to automate this process. Previous research has achieved the automatic generation of surface primitives combined with rule‐based classification to create labeled cuboids and cylinders from point clouds. Although these methods work well in synthetic data sets or idealized cases, they encounter huge challenges when dealing with real‐world bridge point clouds, which are often unevenly distributed and suffer from occlusions. In addition, real bridge geometries are complicated. In this article, we propose a novel top‐down method to tackle these challenges for detecting slab, pier, pier cap, and girder components in reinforced concrete bridges. This method uses a slicing algorithm to separate the deck assembly from pier assemblies. It then detects and segments pier caps using their surface normal, and girders using oriented bounding boxes and density histograms. Finally, our method merges oversegments into individually labeled point clusters. The results of 10 real‐world bridge point cloud experiments indicate that our method achieves very high detection performance. This is the first method of its kind to achieve robust detection performance for the four component types in reinforced concrete bridges and to directly produce labeled point clusters. Our work provides a solid foundation for future work in generating rich Industry Foundation Classes models from the labeled point clusters.


INTRODUCTION
The global infrastructure market is poised for an explosive adoption of bridge information modeling (BrIM), which provides a shared knowledge resource for information exchange to support a reliable basis for decision making during a bridge's life cycle (Fanning et al., 2014). The adoption of BrIM in the United States and the United Kingdom has increased by 30% from 2015 to 2017 (Dodge Data & Analytics, 2017). However, the produced models are mainly as-designed models of new structures. The generation of as-is BrIM models for existing bridges is very limited, despite the widespread adoption of laser-scanning for faster and better data collection (Park et al., 2007;Park et al., 2015). This is because the automatic generation of as-is models from point cloud data (PCD) remains an unsolved problem. The time required to manually create an as-is threedimensional (3D) solid model in a point cloud using cutting edge modeling software typically is of the order of 10 times greater than that required to obtain the point cloud (Trimble, 2014;Lu and Brilakis, 2017).
There is a pressing need to lower the cost and effort required for modeling existing bridges. This is particularly true for the highway infrastructure sector. There are more than 600,000 highway bridges in the United States and more than one in nine is classified as structurally deficient (ASCE, 2017). According to an inhouse report (Network Rail, 2015), Network Rail and other bridge owners manage more than 30,000 bridges on the United Kingdom's motorways and major Aroads. Based on a 2-year inspection cycle, there is a need for at least 315,000 bridge inspections per annum across the United States and the United Kingdom. This explains why there is a huge market demand for less laborintensive bridge documentation techniques that can efficiently boost bridge management productivity.
In general, the from-PCD-to-BrIM model process consists of two steps: (1) detecting bridge components in point clouds in the form of labeled point clusters and (2) generating a geometric model by fitting Industry Foundation Classes (IFC) entities and spatial relationships in labeled point clusters. This study intends to automate Step 1, that is, bridge-component detection in point clouds. This step is currently largely achieved manually using modeling software.
Major vendors such as Autodesk, Bentley, and ClearEdge3D provide the most advanced software solutions for Building Information Modeling. ClearEdge3D can semiautomatically (a few clicks and adjustments are required) fit geometric shapes embedded in the subsets of a point cloud without segmenting beforehand. However, ClearEdge3D is tailored for building and industrial environments and only standardized shapes, such as rectangular walls, pipes, steel beams, and so forth, can be recognized (ClearEdge3D, 2017). For most of the other commercial applications, the shape fitting is largely assisted by manually segmenting the point cloud in advance. These solutions demand a significant amount of attention when segmenting the target objects. Modelers need to repeatedly rotate the point cloud to various views and try to select regions of interest using clipping polygons. Lu and Brilakis (2017) report that, on average, 1.52 hours is needed to complete just the bridge-component detection task for processing a typical reinforced concrete (RC) highway-bridge point cloud.
In this article, we propose a novel top-down method for the abovementioned object detection problem. The novelty of this method lies in the fact that it directly extracts the key components of RC bridges without generating low-level shape primitives.
We discuss the current state of research in Section 2 and we outline the proposed method in Section 3. We then elaborate on the research methodology and experimental results in Section 4. Finally, we conclude in Section 5.

BACKGROUND
Existing software does not provide a fully automatic solution for concrete object detection in point clouds. Much effort has been devoted to automating this process. We define "detection" in this context as the combination of clustering (point cloud-to-point clusters) and classification (labeling the clusters). Current methods of point cloud clustering generally follow a "bottomup" approach, which goes from points to surfaces or patches followed by semantic labeling to derive objects. Most point cloud classification methods follow a "topdown" approach, which employs human visual perception such as relationships, contexts, and so forth to detect specific instances embedded in a point cloud or to infer the semantics of components in a geometric model.

Bottom-up detection
The bottom-up approach pieces together primitive features like points to generate complex systems in successively higher levels until a top-level system is formed (Borenstein and Ullman, 2008). The higher level features are typically the surface normal (Sampath, 2010), meshes (Marton et al., 2009), patches (Zhang et al., 2015), and nonuniform B-Spline surfaces (Dimitrov et al., 2016).
A large body of literature has been devoted to generating surface-based primitives, especially planar surfaces (Pȃtrȃucean et al., 2015). Zhang et al. (2015) present a sparsity-inducing optimization-based method to detect parametric planar patches and define their boundaries from noisy bridge point clouds. However, this method can work only with planar-surface objects and cannot detect pier patches when point densities of these regions are low. Walsh et al. (2013) present a region growing (RG) algorithm to detect objects in a point cloud. RG starts with an initial set of small iteratively merged areas, followed by choosing a specific seed and adding in neighboring points based on similarity until an edge is reached. However, this method cannot detect the edge between a pier cap and a pier in a small portion of a bridge point cloud as shown in their experiment. The segmentation was finally achieved after manually choosing key points. Likewise, Dimitrov and Golparvar-Fard (2015) suggest an upgraded RG method through which the seed is found adaptively. This method can deal with curved surfaces and excels when the input point cloud does not suffer from substantive occlusions. However, it oversegments objects when occlusions are present. The persistent occlusion problem in real point clouds was addressed by Xiong et al. (2013) through a learning-paradigm that detects occluded planar surfaces in building point clouds. However, their method cannot be applied in bridge settings because the occluded surfaces in a bridge point cloud do not follow a specific pattern as in a building point cloud. Specifically, their algorithm detects rectangularly shaped openings, such as windows and doorways, assuming there are many identical openings on a wall and the rectangle is the predominant shape in most buildings. Similarly, Laefer and Truong-Hong (2017) develop a kernel density estimation-based method for modeling steel members by simulating several possible occlusions. In contrast, the occluded regions do not have such a repeated pattern in bridge point clouds. Most of the occlusions are due to on-site vegetation and long-distance scanning. Thus, the occlusions are in arbitrary locations and shapes. Schnabel et al. (2007) detect basic shapes (e.g., spheres, cylinders) using Random Sample Consensus (RANSAC) by random sampling of minimal sets in a point cloud. Yet, given the computationally expensive nature of RANSAC, it is unrealistic to use it to detect complex geometries. Hence, these methods tend to perform well in relatively simplified scenarios and with synthetic data, but are not ready to tackle the real bridge components whose asconstructed and as-weathered shapes further increase the as-designed complexity. To reduce computational time, Xu et al. (2018) suggest an octree-based probabilistic segmentation model for construction sites. The authors partitioned the scene into voxels. However, the segmentation accuracy of this method is quite sensitive to the voxel size. This problem is discussed by Vo et al. (2015) who propose an octree RG-based algorithm for surface patch segmentation in urban environments. Although their method can automatically adjust the voxel size through an adaptive octree, it faces the difficulty of patch generation for low point density regions.

Top-down detection
We contend that bottom-up detection is rarely suitable for point cloud classification. Classification through surfaces is insufficient; local surfaces or patches can be labeled as such, but it is difficult to determine whether they belong to the same instance. The intervention of object-level information is required to overcome such challenges (Pinheiro et al., 2016). The top-down approach is typically a heuristic approach for object detection, which begins with a broad-picture view, and then is broken into compositional sub-problems that are easier to solve (Kokkinos et al., 2006). It usually combines a set of engineering criteria and classifies objects in the point cloud that meet the criteria. Prior studies show that knowledge-based classification methods are robust, as domain-specific information such as object classes (Dore and Murphy, 2014), topological relationships (Koppula et al., 2011), and known parameters (e.g., diameter) or constraints (e.g., direction) (Ahmed et al., 2014) are invariant to factors such as pose and appearance. Recent research relies on existing as-designed documents to inform the top-down modeling strategy. This can simplify point cloud clustering and classification tasks (Liu et al., 2012). Likewise, Belsky et al. (2016) encapsulate domain expert knowledge in the form of rule sets to enrich semantics for a building model.
However, the methods developed in these studies are tailored for buildings and indoor and industrial objects, and not tailored for use in bridge settings, as the geometric properties of bridge components are quite different than those objects. What is more, there are few as-built or as-is models for existing bridges. Recently, some studies have started to employ top-down strategies to detect bridge components in point clouds. Riveiro et al. (2016) use specific constraints to segment masonry bridge point clouds into surfaces. However, their algorithm is based on histograms that largely depend on data quality. It is difficult to generalize this algorithm to large RC bridges, as the real point clouds usually suffer from occlusions and nonuniformly distributed points. Ma et al. (2017) leverage relationship knowledge and shape features to classify bridge 3D solid objects. First, the input of this method needs to be a solid bridge model (i.e., not a bridge point cloud). Second, the method assumes that the 3D solid model is developed in a grid system, in ideal geometries and that the pairwise relationship between two 3D solid objects is well defined. These assumptions are quite restrictive and make the method less feasible in real cases, as bridges usually possess various curved horizontal and vertical alignments and cross sections.

Other detection methods
Data-driven, learning-based methods have been widely applied to predict unknown instance labels based on training feature sets and manually added labels that facilitate supervised learning. Xiong et al. (2013) propose a probabilistic graphical model to label the extracted planar surfaces for buildings. Zhang et al. (2014) use surface features to train a multiclass classifier, which assigns bridge-component labels to surface primitives. As in the previous cases, this method is designed for simplified bridge designs that do not contain skews, irregularities, and complicated objects, which are often the cases of real bridge components.
Numerous volumetric convolutional neural network and deep learning frameworks are proposed by transforming 3D points into voxel grids (Qi et al., 2017;Tatarchenko et al., 2017) for object detection. The major restrictions to apply these data-driven machine learning schemes to bridge-component detection tasks include (1) the lack of a sufficient number of labeled large-scale real bridge point clouds to train a good classifier and (2) the high computational burden.

Gaps in knowledge
BrIM models are mainly represented using volumetric primitives (Tang et al., 2010), whereas most existing works focus on clustering point clouds through generating surfaces. These methods are not robust with regard to occlusions and sparseness and are difficult to transfer to the problem of object detection in a real bridge point cloud because bridge components are often skewed. In addition, none of the existing bottom-up methods can directly output labeled bridge point clusters. The few bridge-related studies that work well have restrictive constraints. We therefore contend that the problem of detecting bridge objects in the form of labeled point clusters from real bridge point clouds has yet to be solved.

Objectives and research questions
We aim to (1) automatically segment an RC bridge point cloud into mutually disjoint point clusters corresponding to the components making up the bridge and (2) automatically assign correct semantic labels to these point clusters. This will be done by answering the following research questions: (1) how to effectively segment a real RC bridge point cloud and (2) how to efficiently and robustly classify major RC bridgecomponent point clusters.

Hypothesis
A simple bottom-up method that generates low-level primitives has not, to date, managed to solve the abovementioned research questions. In contrast, human expertise can facilitate the search for a specific object in a scene because our guesses for the embedded objects are best when we know what to expect in the point cloud. Hence, we will investigate a brand-new top-down method which directly extracts key components of RC bridges without generating low-level shape primitives.
The hypothesis of this work is that the top-down bridge-component detection approach is efficient and reliable and there is no significant difference in detection performance for different RC bridges. We elaborate this hypothesis in the following.

Scope
This research only focuses on typical RC slab and beamslab bridges on motorways and major A-roads in the United Kingdom, because 73% of the existing highway bridges and 86% of the planned future bridges in England are RC slab and beam-slab bridges (Kim et al., 2016). These two types of bridges reflect the fundamental bridge-design rules that allow for natural introduction of geometric constraints, describing the relationships that should hold between various components of a bridge. The key components of RC slab bridges include slabs, piers, and pier caps, and for RC beam-slab bridges, there are slabs, girders, piers, and pier caps (Kim et al., 2016). Thus, this work deals with four very important and highly detectable structural components of a highway bridge, that is, slabs, piers, pier caps, and girders, in line with Kedar's (2016) first element identification evaluation category.

Overview
The general thrust behind our top-down approach is to use the fundamental bridge design rules such as bridge topological constraints, given the low level of variance of the topological layout of RC slab and beam-slab bridges.
Our method bypasses the stage of surface generation altogether and directly obtains segmented and labeled point clusters. It breaks down a large bridge point cloud into sub-data sets through a recursive slicing algorithm. That is, the method chops the point cloud by means of a "virtual parallel scalpel" with a specified equal thickness. This algorithm is repeatedly used with sub-data sets until target objects are found and all small detection problems are solved. The key insight behind this method is to formulate the geometric feature search to explore shortcuts so that the target components can be quickly located in the point cloud.
The workflow of the proposed method is illustrated in Figure 1. Dashed frames refer to ambiguous components that may or may not exist in a bridge point cloud. Acronyms are used to present inputs, intermediate outputs, and final outputs of each step. For example, {B|A} represents B is a subset of A. A is derived from a previous step and is the superset of B. {B|A} may also represent B is a property set of A (e.g., {x i |A} means the x values of A). More precisely, "deck assembly" refers to the areas which contain slab and girders (if they exist), "pier assembly" refers to the areas which contain a transverse strip of slab, pier caps (if they exist) and piers and "pier areas" refers to the subsets of pier assembly which contain a small part of the slab strip, part of the pier cap and individual pier.
The first two steps in our detection method are recursive. The first step segments a whole aligned bridge point cloud (i.e., D N ) into pier assembly (denoted α M ) and deck assembly (denoted α M C ). The second and third detect pier areas (denoted β mp ) and pier caps (denoted P C ) in pier assembly and deck assembly. The last step detects girders (denoted girders) and slab (denoted slab) in a merged deck assembly. Note that pier caps and girders may not exist in some bridge point clouds.

Step 1-Pier assembly and deck assembly detection
A bridge point cloud is given at an arbitrary position and orientation. The pose of a bridge should be  normalized in advance as all features extracted in further steps are in a canonical coordinate frame. We use principal component analysis (PCA) to align a bridge such that the horizontal alignment of the bridge is positioned roughly parallel to the global X-axis ( Figure 2). Approximate alignment at an early stage makes it possible to reformulate features employed for recursive segmentation in the following steps. Therefore, the input of Step 1 is a roughly aligned bridge point cloud We aim to classify all the bridge points into two groups: pier assembly group where m is the number of the pier assembly and deck assembly group α M C . We chop D N into multiple slices along the X-axis ( Figure 3a). Let J be the number of slices, then we obtain slices S X = {S j x : j = 1, 2, . . . , J }, where x refers to the axis of slicing. Define D j = {p ji } = {p i |S j x } to be the point set in slice S j x so that we must have where p ji is the ith point in the jth slice. Note that slicing might lead to an empty slice where local points are missing or a slice with just one single point. In this case, the geometric features cannot be computed. Our slicing method can prevent such situations from happening. By assuming the slice thickness δ is constant, the initialized number of slices J is proportional to the length of the bridge (i.e., J ∝ |max{x i |D N } − min{x i |D N }|). Here, δ is set to be 0.5 m. When the "virtual scalpel" encounters an empty or a single-point slice, the method will infer the geometric feature from the nearest "sound" slice: if |D j | = ∅ or |D j | = 1, then S jx ∼ = S j−ϕx , where ϕ is the count of slices between S j x and the closest nonempty and non-single-point slice. This approximation is not perfect but provides immunity to locally incomplete data.
Then, we need a feature detector that can distinguish the pier assembly from the deck assembly. Each slice S j x is bounded by a 3D axis-alignedbounding-box and a 2D skeleton sk j x is drawn for each slice using the midplane of its bounding-box ( Figure 3b). According to bridge engineering knowledge, piers support the deck against gravity, so they should start from the ground. Therefore, the height of a pier assembly slice should obviously be much larger than that of a deck assembly slice ( Figure 3b). Thus, in each slice S j x , we extract the geometric feature range jz which is the height of S j x . We classify S j x as a pier assembly slice if Equation (1) is satisfied; otherwise, S j x is considered a deck assembly slice: where ρ 1 is a discrimination parameter that refers to the thickness ratio of the deck assembly relative to the height of the bridge, which should not be affected by the varying elevation ( Figure 3c). This assumption (see Section 4.1, A1) will be experimentally justified in Section 4.3.2. The adjacent slices with the same assembly property are merged into a cluster ( Figure 3c). Finally, pier assembly α M and deck assembly α M C (Figure 4a) are acquired.

Step 2-Pier area detection in pier assembly
The inputs of Step 2 are the pier assemblies α M output from Step 1 Step 2 follows the same procedure as Step 1, except that the slicing is performed along the Y-axis of α m to obtain slices S Y = {S j y |α m }. Again, the value of range jz for each slice S j y is extracted. The method classifies S j y as a pier area slice if the Equation (2) is satisfied; otherwise, S j y is considered a deck assembly slice: (2) then, S j y ← pier area ρ 2 is another discrimination parameter that is used to separate the pier area from the rest in α m . For a pier assembly without pier cap, ρ 2 = ρ 1 ; otherwise, ρ 2 > ρ 1 . This assumption (see Section 4.1, A2) will be experimentally justified in Section 4.3.3.

Step 3-Pier cap detection
We illustrate the workflow of this step in Figure 5. We attempt to detect pier caps using surface normal through triangulation in the upper part of the pier area. Triangulation can be computed efficiently for a relatively small region without noticeably affecting either computation time or memory overhead.

3.5.1
Step 3.1-Remove upper deck slab surface. In this step, we aim to remove the upper slab surface points from the pier area(s) {β mp } output from Step 2. The void space between the upper and bottom slab surfaces is used as a discriminator. This blank part is consistent because the laser scanner can project laser beams only onto an object's external surface. According to the Design Manual for Roads and Bridges (Highways England, 2017), the general transverse maximum gradient is defined to be 5% (1/20) so that the lower bound of upper slab points is λ min = 5%W β mp , where W β mp is the width of β mp ( Figure 6) and the upper bound is λ to be the range where upper slab surface points are located. There should be 5%W β mp < λ < ρ 3a H β mp < ρ 1 H β mp , where ρ 3a is the slab thickness ratio estimation (see Section 4.3.3). The points in λ are then removed and the remaining points in pier area(s) are denoted as {Pd mp } (Figure 6 upright).

3.5.2
Step 3.2-Pier cap detection at top of piers. This step aims to detect pier caps {Pc|Pd mp }. The input is the refined pier areas {Pd mp } output from Step 3.1. Pier caps are underneath the slab, playing an important role in  Figure 4b). Then, we consider α m is a wall-type-pier assembly as the single wall-type-pier supports the whole deck assembly above. As a result, a pier cap does not exist (see Section 4.1, A3a).

Scenario 2.
For a pier assembly with cap-and-column pier (i.e., β M > 1) or, for a single detected pier area β mp , but W β mp W α m , in these two cases, the pier assembly α m may or may not have a pier cap (e.g., multiple columns without cap). This scenario is more complex and requires further detection.
Given that pier caps are located on the top of the pier, the upper part of Pd mp (i.e., the top ρ 2 ) is used to detect the pier cap ( Figure 7a). We denote this part as upper Pd mp , which contains the deck assembly's bottom surface, the pier cap (if it exists) and a small part of the pier (Figure 7b). Then, we generate the mesh for upper Pd mp and compute the normal of each triangular surface. Estimating a normal per given triangle is completed by the cross product of two vectors on this triangle. Define { n t (n x t , n y t , n z t ) : t = 1, 2, . . . , T } to be the normal vectors of the triangles. Normal indicates the surface orientation. If a cluster of surface normal is revealed in upper Pd mp whose orientations are quasiparallel to Z-axis (i.e., downward-or upward-oriented normal), where and if those normal are found around the level ρ 1 (max{z i |β mp } − min{z i |β mp }) (Figure 7c, red), then, the points (i.e., feature points) constituting these surfaces together with the points in upper Pd mp that above the feature points are classified as deck assembly. Otherwise, the pier cap feature points are detected if a cluster of downward-or upward-oriented normal is found around the level ρ 2 (max{z i |β mp } − min{z i |β mp }) (i.e., A3b, Section 4.1) (Figure 7c, green). The method iterates through {Pd mp } using the same procedure and the pier caps {Pc|Pd mp } and the piers { pier|Pd mp } are acquired.

Step 3.3-Pier cap extraction from deck assembly.
The detected pier caps in Step 3.2 imply that they should also be present in D PC M , which is the deck assembly output from Step 2. The pier cap parts from D PC M are extracted in the following way, where D PC M = {D PC m } ( Figure 8a). First, the points of D PC m are projected onto the YZplane followed by generating density histograms along the Y-axis through which the number of points is tallied within multi-equal-width bins. The width of bin is determined using the square-root choice: where Cluster represents D PC m and n is its point count. Then, the bins are clustered using the gaps between them (Figure 8b), so that D PC m is segmented (Figure 8c). Denote the segments as {γ m( p+1) } and then a slicing procedure along the X-axis of {γ m( p+1) } is performed. For γ m( p+1) , the pier cap area is detected if range jz > ρ 3b |max{z i |γ m( p+1) } − min{z i |γ m( p+1) }| (Figure 8d), where ρ 3b = ρ 1 ρ 2 with ρ 1 and ρ 2 being determined in Steps 1 and 2, respectively (see Section 4.1, A3c). Next, the procedure is similar to Step 3.1 and Step 3.2. The extracted pier cap area is considered a smaller scale  of β mp . The upper slab surface points are removed and classified as deck assembly, followed by triangulation to detect and classify the deck assembly's bottom surface points. The pier cap parts {Pc|D PC m } are finally acquired (Figure 8f). In the end, both pier cap parts output from Step 3.2 and Step 3.3 are merged (Figure 9).

Step 4-Girder detection
Step 4 aims to detect girders in the deck assembly. This is achieved in two substeps: (1) segment the deck assembly into several segments {deck ω } and (2) detect girders in each segment {girders|deck ω }.

3.6.1
Step 4.1-Segment the whole deck assembly into several segments. To begin with, we conduct a merg-ing process to build up a whole deck assembly cluster, which is composed of slab and girders (if they exist). This involves piecing together all point clusters classified as deck assembly in the previous steps (Figure 10).
For a beam-slab bridge, the length of the girder (i.e., beam) depends on the span, which is the distance between the two intermediate supports (Wai-Fah and Lian, 2014). We need to split the whole merged deck assembly into several segments to find the appropriate length of span. The best cutting planes are not necessarily orthogonal to the horizontal alignment of the bridge (i.e., X-axis of the deck assembly), but rather depend on the orientation of the expansion joints. This is because two adjacent deck assembly segments must be interconnected by the expansion joints as per the Highway Agency's BD 33 design code (Highways England, 1994). The choice of joint depends on many factors, including imposed loadings, anticipated movement, temperature range, deck shortening, and deck rotation. Pier clusters and pier caps are then oriented based on the joints. Hence, the problem of finding the best cutting planes is transformed into orientation determination of the pier clusters or pier caps (Figure 11). We employ a 3D oriented-bounding-box (OBB) to capture the orientation property. OBB is the tightest oriented box depicting a given 3D point set. All bridges have piers, but not all of them have pier caps. So, a pier cluster This process is recursively performed until the entire deck assembly is segmented into multiple segments {deck ω : ω = 1, 2, . . . (m + 1)}, where m is the number of pier clusters (equals to number of pier assemblies) ( Figure 13).

3.6.2
Step 4.2-Girder detection in the deck assembly segment. We now detect girders in each deck assembly segment. We start by rotating deck ω around its Y-axis until deck ω reaches the best projection view, because the original projection results of deck ω might be "muddy" due to a curved bridge elevation. Rotation is conducted through a grid search in a range of angles {ξ }, where {ξ } = [−3.4°, 3.4°], deduced from the general longitudinal maximum gradient (6%) (Highways England, 2017). A density histogram H Z along the Z-axis is employed for evaluating if a best rotation is reached. The best rotation angle is determined using: where std H Z (deck ω ) is the standard deviation of the point counts in the bins. Empirical studies revealed that the best projection determination is not sensitive to the bin   size (varies from #bin = 10 to #bin = 1000, μ = 2.70°, σ = 0.05°) (Figure 14). The bin size is then derived using Equation (6). std H Z (deck ω ) is a stronger indicator than simply the maximum point count bin, because the elevation of girder depends on that of the slab. The best projection may not necessarily be given by the bin with the maximum point count resulting possibly from a concentration of unevenly distributed points. The best pro-jection view can be found once the standard deviation of histogram bins on the Z-axis reaches its maximum. Figure 15 demonstrates an example where the best rotation ( ξ = 2.7 • ) is obtained when the biggest std H Z (deck ω ) (1,178) returns. Next, only the bottom ρ 4 (%) points of deck ω(ξ ) (denoted b deck ω( ξ ) ) are used for girder detection, where ρ 4 = ρ 1 −ρ 3a ρ 1 , with the thickness of deck assembly as well as that of the slab being estimated to be roughly ρ 1 (obtained from Step 1) and ρ 3a (obtained from Step 3.1), respectively. The removal of the deck assembly's upper part (top (1 −ρ 4 )) is crucial as many more points are captured from deck external surface, overpowering the girder points and make the geometric features uninformative. The extremities of deck ω(ξ ) are also excluded to avoid noise from bridge accessory components. Density histograms H Y are drawn along the Yaxis of b deck ω( ξ ) using Equation (6) (Cluster ← b deck ω( ξ ) ) followed by generating the normalized probability of the point density ( Figure 16). The density probability is uniformly distributed with significantly lower variance when there is no girder (i.e., slab bridge) whereas significant peaks can be observed in the distribution with non-trivial variance when girders exist (i.e., beam-slab bridge). For the latter, a binary list (0, 1) is created after thresholding out all small counts (small-count bin ← 0,  big-count bin ← 1). This list is further denoised through a simple k-NN filter, which works as a voting scheme. It checks the label of neighboring bins, and then assigns a candidate label to the investigated bin. This process is iteratively performed until optimal clusters are returned, meaning the "1" chunks have similar length because the girder section type is identical in a specific span (see Section 4.1, A4). The bottom flange can infer a collection of possible girder section types (e.g., Y, U, or SY beams). We then use the web depth (i.e., girder's height) extracted along the best projection view to decide a specific girder type so that the girders can be separated accordingly from the slab.
All the over-segments from Step 1 to Step 4 are merged as per their class labels. The 4-step top-down recursive detection method then terminates.

Assumptions
According to national standards (Highways England, 2017), the proposed method is feasible in the context of RC bridge modeling under the following conditions, which are also confirmed in our experiments. A1. Pier assembly and deck assembly can be separated using the ratio ρ 1 . A2. Pier area and deck assembly can be separated using the ratio ρ 2 . A3a. A pier assembly does not contain a pier cap if a single pier area is detected in the pier assembly and the width of the pier is almost the same as that of the pier assembly. A3b. Surface normals are distinct features that can be used to distinguish a pier cap part apart from a pier. A3c. Pier cap parts can be extracted from the deck assembly using the ratio ρ 3b = ρ 1 ρ 2 . A4. The density histograms along the best view can be used to segment the girders in the deck assembly segment.
In particular, A1 and A2 are validated experimentally in Section 4.3.3 whereas A3a-c and A4 are validated in Section 4.4. We also assume an RC bridge satisfies the following conditions: A5. The piers are quasi-vertical. A6. The on-site clutters and irrelevant points are properly removed manually.

Data and methods
To test our hypothesis, we used a FARO Focus 3D X330 Terrestrial Laser Scanner (ranging error ±2 mm at 10 m, self-leveling: accuracy 0.015°(range ±5°)) to collect PCD of 10 RC highway bridges around Cambridgeshire, United Kingdom ( Figure 17). The locations (GPS), vertical clearance (denoted VC), and other bridge data information are given in Table 1. The large size of the excluded data (i.e., nondata) is mainly due to the manual removal of the on-site traffic noise, trees, large ground surfaces, ramps, and abutments. The samples we used to test our detection method are bridge components. The sample size depends on the standard normal deviation, error limit, and category proportion. As the calculated sample size decreases, the margin of error grows, so theoretically, more bridge data (20) is needed to achieve a good confidence level (CL = 90%) with a relatively small error limit (EL = 0.2). However, the cost and risk of data collection is extremely high, as researchers must operate a laser scanner next to a live carriageway and face significant traffic hazards. We therefore consider this proof of concept study validated if we achieve a low performance variance over the 10 bridge data sets (CL = 90%, EL = 0.3). To our best knowledge, this study has the largest collection of real-world RC bridge point clouds.
We took an average of 17 scans per bridge. The distance between the captured scan points was set to be in 7.67 mm over a scan distance of 10 m (except for inaccessible standpoints). We registered all raw scans using the FARO Scene software. On average, the registration time was 10.6 hours per bridge. Note that several factors may affect the measuring accuracy, such as low temperatures, which may condense elements inside the scanner. As the data-collection work was conducted during the cold winter season, we warmed the scanner up before every task until its internal temperature stabilized. We also used the built-in inclinometer to store the inclination of the leveled scanner.
The relative accuracy achieved (denoted acc in Table 1) was estimated from the averaged fraction between pair reference-point distance in the registered scan data and the corresponding on-site pair-point distance using a measuring tape, independent of other error sources. The spatial completeness of the data sets was computed based on rough estimation of the occlusion-data ratio (Table 1).
Our analyses consist of two parts. The first part is to experimentally define the optimal values of the two parameters (ρ 1 and ρ 2 ) at the level of individual point clusters in Steps 1 and 2, respectively. Then, we derived the optimal values of the other three hyperparameters (ρ 3a , ρ 3b , and ρ 4 ). The second part is to assess the proposed method at the level of bridge structural components using both bounding-box-wise (inspired from the outline evaluation suggested in Truong-Hong and  and point-wise performance metrics.

Data preparation.
We developed a user-defined bounding-box functionality to manually delete irrelevant points such as the on-site traffic, vegetation, ground surface, and so on. This is the only required manual work. The proposed four-step object-detection method is fully automatic without any human intervention, because it is easier for a human modeler to delete irrelevant points than a computer, because the latter requires a sophisticated algorithm to identify which points are irrelevant or are noise. On the contrary, it is difficult for a human modeler to precisely segment a point cloud of complex geometries on 2D computer monitor.
After randomly downsampling, an RC bridge point cloud with the key components contains less than 1 million points. The reason for downsampling is that the original size was not used for manual detection as the commercial software is difficult to handle large data sets. Next, we aligned the cropped bridge point cloud using PCA so that the major axes of the bridge are positioned roughly parallel to the global axes X-Y-Z (none of the bridges can be positioned exactly parallel to the axes due to their real-world skewed geometry).
We created three ground-truth (GT) data sets (optimal desired output to compare against) by manually conducting Step 1 (i.e., GT A), Step 2 (i.e., GT B), and the entire solution (i.e., GT C): GT A: For a given bridge point cloud input, we segmented it into two clusters, namely, deck assembly and pier assembly and assigned them corresponding pointwise labels. GT B: For a pier assembly point cloud input, we segmented it into two clusters, namely, deck assembly and pier area and assigned them corresponding point-wise labels.
GT C: For a given bridge point cloud input, we segmented it into individual point clusters as per their semantic class-that is, structural components, including slab, piers, pier caps (if they exist), and girders (if they exist). Each of these point clusters was bounded by an oriented-bounding-box (hereafter GTBBox). We also assigned the points in each cluster their corresponding point-wise labels. Both GTBBox and manually labeled points were served as reference for comparison.

Implementation.
We implemented the solution into a robust software prototype Gygax (https://github.com/ph463/Gygax) as a proof of concept in a desktop computer (Intel Core i7-4790K 4.00GHz CPU, 32 GB RAM, Samsung 500GB SSD). Gygax is an open research platform in C#. It uses a sparse wrapper to allow the usage of PCL and other open libraries.

Estimation of hyperparameters.
We estimated the two hyperparameters ρ 1 and ρ 2 and compared the results against GT A and GT B, respectively. Denote "S" as a specific point cluster, where S ∈ {α M , α M C } in Step 1 and S ∈ {D PC M , β M P } in Step 2. We defined the following point-wise performance metrics Precision (Pr), Recall (R), and F1-score (F1) as: # of correctly labeled points in cluster s total # of points in cluster s (8) # of correctly labeled points in cluster s total # of points in ground truth cluster s (9) The values of ρ 1 and ρ 2 vary for different bridge configurations. To learn how sensitive the outputs of the first and the second steps of our proposed method is to ρ 1 and ρ 2 , we conducted grid-search over the entire range of values of ρ 1 as well as ρ 2 within the value space (0, 1), and computed the empirical receiver operating characteristic (ROC), which depicts the tradeoff between true positive rate (TPR) and false positive rate (FPR). The TPR (Equation (11)) is known as sensitivity of detection-that is the equivalent of the Recall whereas the FPR (Equation (12)) is known as probability of false alarm (1-specificity), where: A too small or too big (close to 1) ρ 1 may lead the method to consider a whole RC bridge as either pier assembly or deck assembly. By intuition, a relative big ρ 1 , for example, 0.5, can be used to extract pier assemblies in Step 1 because normally, the height of pier should be much larger than that of the deck. Yet, to keep a necessary vertical clearance, the thickness of the deck is almost impossible to be as thick as 0.5 times of the height of pier assembly. We therefore should find a reasonable value of ρ 1 , which is both theoretically and realistically sound. We identified the optimal ρ * 1 and ρ * 2 when the distance to the perfect classification in the ROC (i.e., FPR = 0, TPR = 1) was minimized:  where d ρ k = (1 − sensitivity) 2 + (1 − specificity) 2 , for k ∈ {1, 2}, k represents the number of the step. The optimal thickness ratio ρ 1 for each bridge was then found using Equation (13). Figure 18 shows an example of ROC curve of Bridge 1. Assume the samples follow a t-distribution with 9 degrees of freedom (i.e., n − 1, where n = 10), the 95% confidence interval (CI) critical value was derived from the t-table as 2.262 for calculating the true sampling distribution mean μ ρ * 1 . The optimal ρ 1 was then computed as 0.27 ± 0.03, that is, ρ * 1 ± t(s/ √ n), where ρ * 1 is the sample mean and s is the sample standard deviation (Table 2). We also computed μ ρ * 1 using the bootstrapping technique which resamples the data by replacement with a same sample size of 10 followed by repeating 1,000 times. The 95% CL upper bound was estimated to be 0.29, which is in line with that of the t-statistic. We chose the upper bound of the t-statistic to set ρ * 1 as 0.30 rather than its lower bound because all indicators such as Pr, R, F1, FPR, and d ρ 1 had a good balance when ρ 1 = 0.30 (e.g., F1 0.3 = 0.84, F1 0.24 = 0.74). More importantly, setting a bigger ρ * 1 can avoid extracting too many false positives (e.g., FPR 0.3 = 0, FPR 0.24 = 0.06).
Then, we also grid searched the optimal ρ 2 in Equation (2) in the same way followed by plotting ROC (Figure 18) at various threshold settings ρ 2 for a pier assembly sample. The 95% CI of t-statistic for ρ 2 was derived to be 0.36 ± 0.03. Likewise, we chose the upper bound to set ρ 2 as 0.39. Once we obtained ρ * 1 and ρ * 2 , we calculated ρ 3b = ρ * 1 ρ * 2 ≈ 0.80. Then, we estimated the value of ρ 3a , which is the slab thickness ratio estimation used to remove the upper slab surface points from the pier area(s) {β mp } (see Step 3.1).
The pier areas {β mp } are relatively small regions compared with an entire bridge. It is reasonable to consider the vertical elevation of β mp constant so that we can present the probability of point density for {β mp }. Suppose the distribution of the points along Z-axis of {β mp } is a collection of probabilities of locations on the pier area surface where the points are located in. The density estimations of pier area samples from slab bridges showed an obvious void space between the top surface and the bottom surface of a slab. The 95% CI of the normalized slab bottom level was 0.84 ± 0.06. Likewise, for all the pier area samples from beam-slab bridges. The 95% CI of ρ 3a was estimated to be 0.76 ± 0.04. These statistics suggested that for slab and beam-slab bridges, the top 10% β mp points include the upper slab surface so as we can remove and classify them into deck assembly. This slab thickness ratio ρ 3a is especially used for beamslab bridges in Step 4 for girder segmentation. We chose the slab bottom level estimation of beam-slab bridge to estimate the ρ 3a as 0.28 and 0.2 (i.e., 1 − (0.76 ± 0.04)). Taking into account the shallow girder and the effect of transverse gradient, we chose ρ 3a = 0.2 so that then,

System validation and results
We evaluated the whole proposed method of the prototype on the level of structural components with the optimal hyperparameters identified in Section 4.3.3. Then we compared the results against GT C. The method generated an oriented-bounding-box for each segmented point cluster (hereafter AutoBBox) and assigned a semantic instance label to each point. The run-time for each bridge point cloud was on average 8.02 ± 3.02 minutes (less than one million points), including all four major steps of the proposed method. This means a dramatic decrease of 90% compared to GT C.
We first compared AutoBBoxes against GTBBoxes and evaluated the proposed method's performance using the following conditions. For a specific point cluster generated from the proposed method, let C auto and C gt be the centers of its AutoBBox and its GTBBox (if it exists), respectively, and d(C auto , C gt ) be the Euclidean distance between C auto and C gt . C1. GTBBox of the specific point cluster exists; C2. C auto is inside the corresponding GTBBox; C3. ε = d(C auto ,C gt ) min(l gt , w gt ,h gt ) <50%, where l gt , w gt , h gt are the length, width, and height of the GTBBox of the point cluster, respectively.
The point cluster is correctly detected by the AutoB-Box and we assigned one to True Positive (TP) if all the above three conditions are satisfied; one to false positive (FP) if C1 is false but an AutoBBox is generated; one to false negative (FN) if C1 is true but at least one of C2 and C3 is not satisfied. The Pr, R, and F1 in boundingbox-wise metrics for each bridge were generated using similar formulas as Equations (8)-(10). Specifically, the Pr is the number of correctly detected point clusters out of the total number of AutoBBoxes of a bridge data set, the R is the number of correctly detected point clusters out of the total number of GTBBoxes of a bridge data set, and the F1 is the harmonic mean of the Pr and R. Figure 19 illustrates the point-wise and boundingbox-wise detection results. As shown, Bridge 9 slab contains obvious skew. The average Pr, R, and F1 of the bounding-box-wise performance of 10 bridges were 100%, 98.5%, and 99.2% (Table 3), respectively. All of the components were correctly detected except pier-Cap1 (ε = 71.9%), pierCap2 ( ε = 81.9%) of Bridge 1. Yet, few points in these clusters were detected as FP (FDR pierCap1 = 4.4%, FDR pierCap2 = 8.6%), where the false discovery rate (FDR) for each point cluster is Therefore, although bounding-box-wise metrics can give us a general picture of the performance, they are too sensitive to the locations of misclassified points, which largely affected the values of d(C auto , C gt ). We repeated the system evaluation with point-wise metrics, that is, Equations (8)-(10). Herein, the "S" in Equations (8)-(10) refers to any specific final point cluster generated from our proposed solution.
For a specific bridge point cloud, we computed the micro-average scores. In micro-average, we summed up individual TP, FP and FN from all point clusters to obtain the statistics: where |S| is the number of generated point clusters in this given bridge point cloud. The micro-average F1-score is simply the harmonic mean of Pr micro and R micro .
Assumptions A3a -A3c were justified as the method recognizes that there was no pier cap in the pier assemblies of Bridges 2, 3, 5, and 7 (wall-type-pier) and Bridges 4,6,8,9, and 10 (multiple columns without cap). The method correctly identified the pier caps in Bridge 1. Table 3 shows that the proposed method achieved remarkable performances: the highest micro-average of Pr/R/F1 was rounded up to 100% and the lowest was 89.1% (for multiclass case, the micro-averages yields result in a mathematically equivalent definition for precision and recall, thus equivalent F1-score).

Discussion
An example of confusion matrix of Bridge 2 is given in Table 4. Although we achieved high detection rates with the PCD of all 10 RC bridges in both the bounding-boxwise and the micro-average assessments (Table 3 and  Table 6), the FDR of some point clusters revealed that the proportion of the FP points is not insignificant. This is especially true for Bridge 7. There are a few components that reached very high detection precision, such as the pier (97.2%) and many girders, such as girder14 (100%), girder15 (100%), girder22 (100%), and girder23 (100%), among others. However, some points were not properly classified. Normally, a slab cluster should contain the most populated labels in a bridge point cloud (i.e., it has the most points). For Bridge 7 (Table 6), we had a misclassification for the slab point cluster ( FP slab = 37764, FDR slab = 17.7%). The FDR was also not trivial for girder11 (21.1%), girder19 (27.9%), girder21 (23.3%), and girder29 (20.6%). The microaverage metrics adequately captured class imbalance issues and brought the overall precision average down to 89.1%.
There are two main reasons for the reduced classification performance in Bridge 7. First, the significant parabolic vertical alignment of the roadway in each deck segment of this bridge made the segmentation less accurate. Future work should develop a further deck-segment slicing procedure in Step 4 to alleviate the impact of parabolic curves. Second, the girders were placed so close to each other that the gaps between adjacent girders were difficult to be seen by a scan sensor. Around 8% of the surface of Bridge 7 was occluded (Table 1) mainly due to the fact that the points on the webs of the girders were missing. As a result, these regions were quite ambiguous, making it difficult to detect an individual girder by the proposed method. The points between adjacent girders were misclassified as slab. To learn how many occlusions are exactly acceptable, we reconducted experiments using Bridge 1 by creating arbitrary occlusions in slab, pier caps, and piers, respectively, while others remain unchanged. Then we combined all these occlusions. The occlusion level was estimated to be 30-40%. Table 5 shows that our method achieved high detection performance. However, it is not encouraging to process such a high occlusion level data in real applications. The experiment of the best projection search of deck 2 of Bridge 8 also proved that our method is robust to very unevenly distributed points between the upper and lower slab (Figure 20). Indeed, there could be some extremely nonuniformly distributed points scenarios that our method may not accommodate. Yet, we believe that a carefully planned and elaborately designed scanning process could eliminate these cases. Specially targeted laser-scanning techniques or settings are required for these challenging regions. The method of Laefer and Truong-Hong (2017) can be considered in this endeavor.
It is worth noting that our method is efficient for a certain type of RC bridge, that is, the typical RC slab bridges and beam-slab bridges. Although it is too soon to claim that the proposed method will accommodate all needs, the experiments proved that our method fills some gaps in knowledge and is capable of dealing with some very common and important types of highway bridges.
This method could likely be scaled up for more complicated bridges. Additional procedures can be integrated into our method to detect other elements, such as abutments, bearings, handrails, and so forth. Future work can also be built on Step 4 to detect and segment delicate components in bridges with more complex superstructure geometries (e.g., grid-beams, cross-beams). The method developed for reconstructing gridded steel structures (Gyetvai et al., 2018;Laefer and Truong-Hong, 2017) can be integrated.

CONCLUSIONS
Object detection in bridge point clouds remains unsolved. In this work, we presented a novel top-down method for major bridge-component detection in point clouds and tested it on 10 sets of real RC highwaybridge PCD. The validation metrics showed that the method is very reliable, which supports our hypothesis. Given the high performance of our method on realworld bridge point clouds containing defects such as occlusions and sparseness, we contend that there is virtually no human intervention needed to check whether the points are correctly classified in each step.
The contributions of this research are the following: 1. Our method can deal with complex real-world bridge topologies, such as varying elevation and slightly curved horizontal alignment. The method also excelled on Bridges 8 and 9, which contain obvious curved horizontal alignments. 2. Our method can handle challenging scenarios, such as occlusions and locally variable densities of points. Although some input is very sparse (e.g., Bridges 4, 6, and 8) due to long scan distances, our method still achieved quite good performance in these point clouds. 3. Our method dramatically reduces computational costs by breaking down a large set of point cloud into subsets. In this way, large-scale object detection efficiency can be significantly improved without sacrificing precision.
However, the proposed method does not intend to be a cure-all. More bridge data with various configurations especially that with different girder and pier cap types should be included and investigated in future studies. This can enhance the statistical power of the method with increased confidence level and reduced margin of error. Then, the method is not suitable for diaphragm bridges whose upstand diaphragms lay on the same level of the integrated lateral beams. It also cannot deal with concrete bridges with complex geometries or steel bridges (e.g., truss bridges). In addition, small girder spacing and severely unevenly distributed points may affect and decrease the detection performance of the proposed method for beam-slab bridges. These regions require additional very-high-resolution scanning. In short, this research indicated that the most important typical RC bridges can be supported using the proposed solution, which can significantly reduce the modeling cost and will accelerate the adoption of BrIM for existing RC bridges. Future planned works will focus on (1) the overcoming of the abovementioned limitations and addressing some of the assumptions; (2) upgrading the algorithm to scale up to more complex bridge systems and detection of more components; and (3) fitting IFC objects to the generated labeled point clusters.

ACKNOWLEDGMENTS
This work is funded by EPSRC, EU Infravation See-Bridge project under Grant No. 31109806.0007 and Trimble Research Fund. We thank them for their support.

DATA AVAILABILITY STATEMENT
The raw bridge PCD can be accessed by the following link https://doi.org/10.5281/zenodo.1233844.