Looking for a needle in a haystack: Probability density based classification and reconstruction of dormers from 3D point clouds

Accurate reconstruction of roofs with dormers is challenging. Without careful separation of the dormer points from the points on the roof surface, the estimation of the roof areas is distorted. The characteristic distortion of the density distribution in comparison to the expected normal distribution is the starting point of our method. We propose a hierarchical method which improves roof reconstruction from LiDAR point clouds in a model‐based manner, separating dormer points from roof points using classification methods. The key idea is to exploit probability density functions to reveal roof properties and to skilfully design the features for a supervised learning method using support vector machines. The approach is tested based on real data as well as simulated point clouds.

Due to the high relevance of roof models in many fields of applications, the 3D reconstruction of rooftops from aerial data (e.g., LiDAR or photogrammetric point clouds) is a well-studied subject of research. In this context, top-down, bottom-up and even hybrid methods are employed. Most of these methods focus on roof models without taking their superstructures (i.e., dormers and chimneys) into consideration. Top-down methods suffer from big structures which smear the estimated models, whereas bottom-up methods have difficulty identifying small structures from the data. Roof structures such as dormers and chimneys represent outliers with regard to a simplistic roof model which does not take dormers into account. The occurrence of such outliers complicates the robust modeling of roofs. Hence, our key idea is to examine the impact of such structures on roof determination in order to derive roof models and to identify the presence or absence of superstructures. From an urban planning point of view and with a view to a more precise classification of the type of use, dormers are of great importance because they indicate the value of the building and the type and intensity of its use.
The main contribution of this article is a novel approach which classifies and reconstructs roofs and their structures based on designed statistical features derived from probability density functions (PDFs). The characteristic distortion of the density distribution of the defects caused by dormers in comparison to the expected normal distribution is the starting point of our method. In particular, the method is able to detect and differentiate between gabled ( Figure 1a) and shed dormers ( Figure 1b). We implemented a simulation environment which enables us to analyze roof dormer parameters in a systematic way. The simulation toolbox implemented allows for the generation of point clouds and allows us to generate various roof structure types with different parameters. The data simulated, with different resolution and point densities, are a good experimental basis for assessing the influence of these parameters on the reconstruction of roof models. Furthermore, roof models with and without outliers can be intensively investigated and compared before being classified from real data using a priori knowledge derived from the PDFs. The key idea in our article is the exploitation of latent knowledge encoded in PDFs in order to distinguish between roof models and the otherwise challenging structures which have been neglected up to now in most approaches. Our approach is expecting a 3D point cloud of a roof as depicted in Figure 2. Further, statistical measures such as inclination or residuals with respect to a predefined roof model are calculated. From these dimensions, PDFs are derived in order to skillfully build feature vector based on PDF properties such as skewness. The feature vector is the prerequisite for a subsequent classification and reconstruction step of roof and dormer models (cf. Figure 2).
Our approach consists of a hierarchical pipeline which comprises the detection and classification of roof structures in 3D point clouds, followed by a semantic segmentation using a clustering method, and finally the reconstruction of the identified objects. In this context, a variety of methods such as support vector machines (SVMs) and RandomForest for the classification, DBSCAN for the density based clustering, and RanSaC for the roof reconstruction are applied.
This article is not limited to simulated data. Instead, we apply, adapt and transfer the method developed for simulated data to real data. We demonstrate that the approach described for simulated use cases can be successfully Roof dormers represent only a small part of the underlying point clouds. It is easy to mistake them for white noise, and it is hard to identify them. Nevertheless, we were able to demonstrate that with our approach it is possible to identify even rather small dormers. The novelty of our approach consists of two unique selling points.
The first is the addressing of the explicit identification of dormers during the reconstruction of the underlying roof model. As yet, such structures have been reconstructed as a by-product of roof reconstructions (e.g., with RanSaC). In this article, we demonstrated that the models reconstructed in such a way are distorted due to the presence of dormers. Smeared models due to the presence of superstructures represent in this sense the starting point of our method. The second specific point of our approach is the usage of PDFs (e.g., of residuals) and their distortions for the identification and reconstruction of dormers. This represents an innovation and is used, to the best of our knowledge, for the first time in the context of city models and building reconstruction.
The remainder of this article is structured as follows. Section 2 gives insights into related research. Section 3 elaborates our approach and consists of four main parts: the simulation of roof structures (Section 3.1), their classification (Section 3.2), the segmentation of roof point clouds (Section 3.3), and the reconstruction of the objects identified (Section 3.4). Section 4 introduces our experiments and discusses the results achieved and the limitations of our approach. Section 5 summarizes and concludes the article.  Pu and Vosselman (2009) presented a knowledge based reconstruction method of building models from terrestrial laser scanning data. The general problem, particularly for data-driven approaches, is the challenging reconstruction of dormers, since they represent a small part of the observations. Kada and Wichmann (2013) presented a feature-driven approach to the modeling of 3D buildings. Their method detects low-level roof structures (e.g., dormers) using subsurface segmentation of the input point cloud. Henn, Gröger, Stroh, and Plümer (2013) proposed an enhanced version of RanSaC combined with a hierarchical classification using SVMs, leading to robust roof model estimation even from sparse LiDAR data. The authors of this paper did not explicitly consider roof superstructures such as dormers, however. In the sense of RanSaC, huge dormers on the roof represent outliers which will be closely investigated in our article.

| REL ATED WORK
Our approach draws upon PDFs which are used to design suitable features for the classification of roof structures. To this end, a possible issue to address is to compare the PDFs with each other. In this context, Sakurai, Li, Chong, and Faloutsos (2008) introduced a method which compares two distributions using symmetric and asymmetric Kullback-Leibler divergence.
There has been intensive work on the analysis and interpretation of point clouds. Ioannou, Taati, Harrap, and Greenspan (2012) introduced an operator based on normal vectors for object recognition in point clouds. They propose the difference of normals (DoN) as a distinct measure for the description of point cloud characteristics depending on a parametrized neighborhood. Jones and Aoun (2009) used histograms based on the angular relationships between a subset of normals for the identification of object classes in a 3D point cloud.
As mentioned, to study the effect of variable parameters of roofs and their structures, our article is based on our own simulation environment which generates building models with synthetic point clouds. A related paper for the generation of LiDAR point clouds is that of Lohani and Mishra (2007).

| ME THODOLOGY
This section presents an overview of our approach and its theoretical background. The main components of our method are the simulation of data, the classification of roof structures, the clustering of the points of these structures, and finally their reconstruction.

| Simulation of roof structures
In order to examine and analyze the impact of probability density based features on the identification and reconstruction of roofs and their structures, we implemented a building and laser scanning simulation toolbox. On the one hand, the simulation environment consists of a component which allows us to generate virtual building models, in particular roofs and roof structures, as depicted in Figure 3. The possibility of generating several roof polygons comprising gabled and shed dormers simultaneously allows for a fast acquisition of virtual city models.
The distribution of the dormer polygons is based on information indicating their positions on the rooftops, their inclination and a range of ridge lengths, among others. Since the dormer widths and the ridge lengths are randomly sampled, different roofs with various dormer models can be created automatically (cf., Figure 3, right). In order to ensure realistic object generation, the choice of the location and shape parameters of dormer models is based on recommendations taken from the building regulations for the state of North Rhine-Westphalia in Germany. These instructions constrain the dormer dimensions with regard to the calculation of safety distance areas, the granting of building design authorizations, the use of roof superstructures as escape routes, and the arrangement of roof superstructures, taking fire protection regulations into consideration.
On the other hand, the second component consists of a laser scanner simulator which allows for the generation of aerial LiDAR point clouds with different settings, (e.g., resolution) as presented in Figure 4. To reflect real F I G U R E 3 Front-end of the simulation toolbox for the generation of polygonal buildings and their parts (left), taking several parameters into account. In particular, several roof objects including superstructures can be generated simultaneously (right) F I G U R E 4 Front-end of the laser scanning simulator use cases, point clouds with various densities can be produced by varying the motion resolution as well as noise information. A bounding box is used to specify the range where the density specification must hold.

| Classification of roof structures
An important step in our approach is the classification of rooftops and their superstructures (e.g., dormers). To this end, the right choice of discriminant features and predictors is of high importance. The key idea of our method is to design and derive suitable features based on PDFs. The behavior of the PDFs characterizes different roof objects, which contributes to their discrimination. To this end, we considered different 1D measures (e.g. roof inclination), derived from the 3D input point cloud. Based on these measures, each PDF is estimated using a nonparametric kernel density estimation (KDE; Wand & Jones, 1994).

| Roof model residuals
For the classification of dormers, the first measure which we consider is the probability density of the roof model residuals. This assumes the availability of predefined roof models. Accurate roof models allow for an accurate determination of the residuals. Since roof structures represent white noise in the 3D point cloud, the PDF of the residuals is a good tool which reflects their properties. In this context, the residuals are determined using RanSaC (Fischler & Bolles, 1981) as part of the model-driven roof reconstruction method of Henn et al. (2013). Since the accuracy of the underlying roof model is correlated to the geometry of the superstructure, we investigated their impact on the learning performance when they were included and excluded as predictor.

| Inclination
Likewise the PDF of the inclination of each point from the point cloud is calculated. To this end, a surflet (Wahl, Hillenbrand, & Hirzinger, 2003) is considered, consisting of a point from the point cloud and its normal vector, which represents an approximated plane of its k-neighborhood. We chose k = 5 in order to take small structures into account. Based on a singular value decomposition (Förstner & Wrobel, 2016), the five points with coordinates [x i , y i , z i ] are used to fit a plane whose normal vector is the eigenvector v i associated to the smallest eigenvalue i of the following matrix A: The inclination is then calculated based on the z-axis of the coordinate system and the surflet for each point.

| Mean squared error
We also found the mean squared error (MSE) of several measures such as the abovementioned point inclination. (1)

| Angles between normal vector pairs
Following the method of Jones and Aoun (2009), we considered surflet pairs and derived bilateral angles between them. Apart from the angle γ between the z-axis and a normal vector which is already incorporated into the calculation of the inclination, two further angles, α and β, are considered as illustrated in Figure 6.

| Difference of normals
We make use of the method of Ioannou et al. (2012) who developed an operator to deal with huge unstructured point clouds. This operator calculates the difference of surflets of a given point in two different-sized neighborhoods. This enables the modeling of the sensitivity with regard to small structures. Figure 7 summarizes the idea of the DoN depending on a parametrized radius. In order to acquire a 1D measure, we computed the angle between the resulting difference vector and the z-axis.
For the classification task, we designed a set of features in order to discriminate between roof points and dormer points in a first step and between different dormer types in a subsequent step. As stated, we focus in this article on features based on PDFs. In this sense, we derive PDFs based on the five mentioned one-dimensional needed to perform the classification process for discrimination between roof structures. In this context, the derived PDF of each measure using KDE is closely investigated with regard to the following properties.

| Skewness
One important property of a PDF is its skewness, which influences its shape and in particular its symmetry. The skewness of the PDF of a data set of n measures x i is defined as follows (von der Lippe, 2018): where x is the mean of the x i and s their standard deviation. Depending on the value of S, we distinguish between left (or negatively) skewed and right (or positively) skewed distributions. A value of zero indicates a symmetric distribution.

| The excess kurtosis
A further property of a PDF is the kurtosis which describes its steepness. This measure is calculated based on the fourth moment of the underlying data x i : The sign of K is an indicator of the steepness. A positive (negative) value characterizes fat-tailed (thinner-tailed) distributions. Symmetric distributions (e.g. normal distributions) have zero kurtosis.

| Kullback-Leibler divergence
The Kullback-Leibler divergence (Kullback & Leibler, 1951) is a suitable discrimination measure which quantifies how one PDF p is different from another PDF q (Shlens, 2014) based on the information gain as follows: Equation (4) specifies the information lost in bits when q is used to approximate p. A symmetric Kullback-Leibler divergence is defined further as follows (Sakurai et al., 2008): F I G U R E 7 Difference of normals (Ioannou et al., 2012)

| Parameters of the PDF
Fitting a PDF leads to a parametric PDF approximation of the estimated PDF acquired from the nonparametric KDE. In this sense, if a normal distribution can be assumed, the parameters μ and σ are exploited and taken as features. In the spirit of Dehbi and Plümer (2011), an arbitrary distribution which fits the data best can be determined after performing statistical tests (e.g., chi-squared).

| Quantiles
Further properties of a PDF are reflected by their quantiles, dividing the range of the PDF into equally probable continuous intervals. A important quantile measure is the well-known median. In this context, we partitioned the distribution range into five equal parts. We considered the following quintiles: 0.2, 0.4, 0.5, 0.6 and 0.8, as illustrated for a normal distribution in Figure 8. These values take part in the feature space for the classification.

| Minimum of the difference function
This measure is the global minimum of the difference of two distributions. In this case, a normal distribution is assumed. In this sense, a normal distribution is fitted representing a reference distribution of the PDF estimated by KDE. The difference between these two distributions is then calculated, providing its global minimum as an additional feature. Figure 9 illustrates a PDF and its approximated normal distribution together with the resulting difference function and global minimum.
Based on the predesigned set of features mentioned, we followed a hierarchical approach to the classification of the roof dormers as depicted in Figure 10. First, we differentiate between roofs with and without roof structures in a supervised manner using a binary classification. For this classification, we used SVMs (Vapnik, 1998) as a robust classifier. To this end, we labeled point regions belonging to different roof objects in order to train the classifier. Learning is performed using 10-fold cross-validation based on simulated data acquired from our simulation toolbox and based on real data from aerial LiDAR point clouds as well. In order to assess the quality of the F I G U R E 9 Characteristic deformation of the residual distribution of a roof surface caused by a dormer. An expected normal distribution is taken as reference: (left) Difference function of both distributions, showing the global minimum; and (right) PDF resulting from KDE (blue) and its approximated normal distribution (red) F I G U R E 1 0 Hierarchical classification and reconstruction of roof structures probability density based features, we performed a feature selection using the Relieff algorithm which calculates a predictor weighting (Kononenko, Simec, & Robnik-Šikonja, 1997;Robnik-Šikonja & Kononenko, 2003) allowing the selection of the best features with regard to the given class. In this way, redundant and, in particular, unsuitable features are eliminated a priori.
In the second step of our hierarchical classification, only roofs with superstructures are considered. At this stage, we focus on the discrimination between different dormer types (in particular, gabled and shed dormers).
Having noticed the existence of roofs with both types, we undertook a multi-class classification. We considered three classes: "roof with gabled dormer," "roof with shed dormer," and "roof with gabled and shed dormer." Among others, features related to residuals acquired from an a priori model based roof estimation are used in this supervised learning step.

| Segmentation of the point cloud
The last section provides classified roofs, including the dormer types. Toward the reconstruction of these acquired objects, we perform semantic segmentation of the point cloud with regard to the dormer types. To this end, we conduct clustering using the Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm (Ester, Kriegel, Sander, & Xu, 1996). This corresponds to the third step of our hierarchical method (cf. Figure 10).
Compared to other clustering methods, DBSCAN does not require the number of clusters to be known a priori.
Furthermore, no assumptions on the cluster shapes have to be made. In this context, we clustered again based on 1D measures combined with location information for the reconstruction later on.
Best results are achieved by the incorporation of the residuals information. However, those residuals require the roof model as prior knowledge. Hence, we investigated whether a residual-free clustering is possible, omitting the need for predefined roof models. In this context, the inclination angles and their MSE are applied instead. In the example of shed dormers, Figure 11 shows a roof point cloud colored according to the point inclinations on the right. The investigation of the PDF of these inclinations reveals that this 1D measure turns out to be, among others, a good prior to distinguish between roof and dormer points. The lower (higher) peak characterizes dormer (roof) points. Since we calculated an angle for each point, the clustering results can be associated with the spatial information of the point cloud which is the basis for the reconstruction step.

F I G U R E 11
Three-dimensional point cloud of a roof colored according to the inclination of the points: roof points (green) and shed dormer points (red). The PDF of the inclination gives insight into the object type

| Reconstruction of dormers
The point clusters acquired from the previous step can be used to reconstruct the roof and dormer surfaces. The roof reconstruction is performed using RanSaC (Fischler & Bolles, 1981) which is expanded in order to consider different roof models based on a catalog of roofs in the spirit of Henn et al. (2013). This derived model can be compared with the roof model estimated from the whole point cloud. In this manner, the influence of white noise on roof detection and reconstruction can be investigated. This enables us to compare models stemming from outlierfree point clouds with those derived from point clouds including roof structures. In particular, the simulated point clouds allow for the comparison of these two models, providing a true model as ground truth.
The reconstruction of shed dormers is mainly performed using the Gauss-Helmert model (Förstner & Wrobel, 2016),

| E XPERIMENTAL RE SULTS
This section describes our experiments, including the data settings and the results achieved.

| Data settings
For our experiments, we generated training data consisting of 1,700 objects, including roofs with and without structures. The inclinations of the dormers vary between 15 • and 45 • . Their widths and depths range from 0.5 to 4 m, corresponding to detached houses as building type. The point cloud has an average point cloud density between 6 and 11 points per square meter, which corresponds to the resolution of the point cloud from the ISPRS benchmark from Vaihingen (Rottensteiner et al., 2012) and an annotated point cloud from three districts in Dortmund, Germany, respectively. In order to quantify the noise impact, we generated point clouds with a noise of 7 and 14 cm. To ensure a smooth transition from simulated to real data, and hence to reflect the real data from Vaihingen and Dortmund, we generated point clouds following the information from the manufacturer of our LiDAR system (Leica ALS50, with a vertical accuracy of 14-36 cm). In contrast to simulated point clouds, the real data contain noise stemming from façade elements, chimneys or even from other neighboring roof parts which have been inaccurately separated. A comparison between real and simulated data is shown in Figure 12. For the annotation of our real data, we make use of orthophotos of the roofs and their footprints for the identification of roof and dormers (see Figure 13). The orthophotos were sourced from the Geobasis NRW (https://www.bezre g-koeln.nrw.de/brk_inter net/geoba sis/index.html) portal which provides basic open digital geodata from official surveys. The resolution of orthophotos is not always sufficient to identify the roof structures, in particular due to shadows and occlusions. In such cases the corresponding point cloud has been checked to ensure correct labeling, in particular to distinguish between roofs and their superstructures. However, in this context we face great difficulties in labeling a sufficient number of dormers of different types. A reliable differentiation between roof structures, roof windows or similar objects is still a challenging pre-learning task. For our experiments, the LibSVM toolbox was used (Chang & Lin, 2011) with an radial basis function kernel. For the sake of comparison, we also trained a RandomForest classifier (Breiman, 2001) using Python scikit-learn (Pedregosa et al., 2011).

| Simulated data
The classification of synthetic data in roof objects with and without dormers turns out to be very successful using SVMs. Based on Relieff, the weighted ranking of the features selects those based on the residual (R) distribution and inclination (I) distribution as top predictors. Table 1 shows an excerpt from the best features according to the feature selection. Both residual and inclination based PDF features turn out to be good for this classification step.
These features have been used to train and test the SVM classifier, leading to the results summarized in Table 2.
The results of a 10-fold cross-validation as well as the test results on unseen data led to an overall accuracy of more than 98% for a noise of 7 cm. Comparable results have been achieved for a point density of 5.5 points/m 2 .
For a noise of 14 cm, the "roof with dormer" class is perfectly classified. However, poor results have been achieved for the "roof without dormers" class due to a high number of false positives. By varying the point density, we noticed that the importance of the inclination-related features decreases for smaller point densities with more noise, which left the classification relying on residual features, which needs an a priori roof model as background knowledge. In order to analyze the impact and importance of residual information, the residual-related features  Table 3. Both classifiers used almost the same features for the distinction between roofs and dormers, as depicted in Table 4. In our sampled training objects, we took care to consider dormers and roofs with various parameters. In this context, we varied the width, depth, inclination, and position of dormers on roofs. Figure 14 shows the impact of dormer width on the PDF of the residuals. For this example, we considered a gabled roof TA B L E 1 Feature selection for simulated data for the classification of roofs with and without structures using SVMs: I, inclination distribution; R, residual distribution with the dimensions 8 m × 12 m × 4 m, where the ridge is aligned along the longest side. Small differences can be noticed around the maxima with regard to skewness and kurtosis. The most important distinction, however, lies in the peak height. The PDF of the residuals of dormers with higher width is characterized by smaller peaks and higher width. The width of the peak of each PDF is in general correlated to the noise of the point cloud. The inflection in the PDF of the roof without dormer characterizes the roof ridge. As mentioned, we followed the rec-

| Real data
In order to evaluate the transferability of the learned model from the simulation to a real use case, we applied the trained classifier from the simulated point cloud with a density of 11 points/m 2 to 58 buildings from the real data from Dortmund's Kreuzviertel district. The results were poor. In this context, all roofs were classified as roofs with superstructures, with about 50% false positives. Due to the inclination angles during the survey flights, the presence of further points such as those from façades or balconies in the point cloud was unavoidable, leading to such

F I G U R E 1 4 Impact of varying dormer widths (left) on the PDF of residuals (right)
TA B L E 4 Feature selection for simulated data for the RandomForest classification of roofs with and without structures: I, inclination distribution; R, residual distribution results. Besides the presence of such points in the underlying point cloud, this can be possibly attributed to the different nature of the real and simulated regions (urban vs. rural). The roof shapes modeled during the simulation are single-family houses in a rural residential area. They comprise gabled and hip roofs with rectangular footprints.
The real data from Dortmund originate, however, from urban areas with bigger footprints. Furthermore, the three districts are characterized by different building types. As mentioned, the point clouds of individual roof objects represent a further difficulty related to real data. These regions are not always correctly separated from each other. Hence, the quality of the real data is worsened by the fact that points in some data sets of roof parts of one object are assigned to another roof object. Above, the presence of other superstructures such as chimneys in the real data set influenced the results. However, this influence turned out to be minimal after performing a classification including such objects as white noise. In this context, we trained a new classifier by adding chimneys as white noise to the simulated data. An application to the real data showed no substantial improvements, however. For this reason, we trained a new model on the real data using the same features and applying a new feature selection.
The classifier was trained based on the data from Hombruch and Kreuzviertel together and subsequently applied to Dorstfeld as test region. Due to the different characteristics of the three regions, the results were not fully satisfactory. To improve these results, we randomly used 80% of the mixed labeled data from the three districts and tested on the remaining 20%. Good results with SVMs were achieved, amounting to 85% overall accuracy based on the features listed in Table 5. This result can be increased by 15% in terms of accuracy, precision and recall by the use of the six first features only as depicted in Table 6. The results achieved using RandomForest are shown in Table 7. It can be stated that the two classifiers used partly different features. In both cases, however, the inclination-related features are of great relevance (see Table 8).

| Classification of dormer types
Since dormer types are hardly distinguishable in orthophotos due to shadows and insufficient resolution, we used only simulated data for the classification of dormer types. Our aim is to decide whether a given roof contains a gabled dormer, a shed dormer or both. The feature selection revealed that residuals and inclination based features are also important for this task, as shown in Table 9. Omitting the residual distributions leads to the same effect on the results as in the previous classification task. The classification results using SVMs are shown in Tables   10 and 11 for different point densities (11 and 7.5 points/m 2 , respectively). Table 12 shows the results from the RandomForest classification. Here, a great overlapping of the features needed for SVM and RandomForest can be seen from a comparison of Tables 9 and 13. We also performed a binary classification for the discrimination between the two dormer types. For this task the kurtosis of the residuals and the α-values turned out to be the most important features. In particularly, we found that the inclination is discriminatory for shed dormers rather than gabled dormers. Features from the distribution of DoN, α, and β angles also turned out to be important, as listed in Table 14. In a further step, we incorporated chimneys into the simulated data for the assessment of their influence on the classification. The classification results revealed that these structures did not have a significant impact on the classifier performance.  for a density of 11 points/m 2 good clustering results are achieved, subsequently enabling a reconstruction of dormers. A further clustering based on the other pointwise calculated features (i.e., DoN, α and β angles) did not led to better discrimination of dormer regions. In particular, clusters based on the DoN contained a high proportion of noise and ridge points. For our experiments, we used ε = 5 as neighborhood parameter in the first clustering step, whereas ε = 2 was subsequently set for the spatial clustering.

| Clustering and reconstruction of roof structures
TA B L E 8 Feature selection for real data for the classification of roofs with and without structures using RandomForest: I, inclination distribution; R, residual distribution The reconstruction of the dormers depends on the clustering step which in turn depends on the dormer positions and the point density. This influences the determination of the minimum bounding rectangles enclosing the dormer points. In this context, outliers lead to oversized dormers. Figure 16 illustrates the reconstruction results of shed dormers for different point densities. Green points were acquired after performing the second DBSCAN clustering based on spatial information of the points. On the one hand, the reconstruction depends on the cluster results.
Clusters that are too close may lead to a non-separability of their point sets. That means that the clustering guarantees the identification of dormers if their bilateral distance is sufficiently large. On the other hand, the reconstructed dormer models are strongly influenced by the quality of the point cloud. In our experiments, a point density of 11 points/m 2 leads to the best reconstruction results. For thinner point clouds, success depends on the homogeneity of the point cloud. With lower density, ridge points are mostly erroneously associated to dormer clusters.

TA B L E 11
Multi-class classification results of dormer types using a 10-fold cross validation: point density, In order to assess the impact of dormers and roof superstructures in general on the quality of roof models estimated by RanSaC, we compared the deviation of inclinations between such roof models and reference models.
The results are summarized in Table 15. It can be stated that roof models without dormers are identified more accurately than those with dormers, confirming their influence on the estimation. Higher noise does not have a negative impact on the resulting models, which explains the robustness of RanSaC against noise.

| Discussion
In this subsection the influence of the point cloud density as well as the probability density based features will be discussed. Among others, the dormer positions on the roof are a parameter worthy of consideration during TA B L E 1 4 Feature selection for the binary classification of dormer types (gabled and shed) based on simulated data using SVMs: I, inclination distribution; R, residual distribution the simulation and, hence, worth optimizing for real objects later on. In order to quantify these variations of the parameters and their correlations with each other, a synthetic data set was investigated, which varies one degree of freedom with respect to the underlying dormer model. Figure 17 shows the influence of these parameters on the PDF of residuals for both shed and gabled dormers.

| Influence of the point cloud density
In our experiments, we reduced the point cloud density and performed our evaluations in the following steps: 11, 5.5, 4.4 and 3.3 points/m 2 . The point clouds acquired were used as input for the clustering algorithm (cf. Figure 16).
For lower densities (below 7.7 points/m 2 ), we observed that dominating ridge points are clustered as dormer points.

| PDF of residuals
Since the residuals are calculated based on a roof model, their distribution is independent of the roof type. In this context, the properties of the point cloud are relevant since the accuracy of the estimated roof model is dependent on them. Analogously to Figure 14 where the impact of dormer widths was evaluated, we generated several plots by varying other parameters such as the inclination, depth, position, and ridge size for each distribution.
The results are shown for the case of the residuals PDF in Figure 17. The figure also compares shed and gabled dormers.

| Distribution of inclination
The distribution of inclination is sensitive to the point neighborhood selected, which was used for its calculation. From our empirical experience, k = 5 was suitable for a point density of 6.6 points/m 2 and a noise of 7 cm.
Analogously to the residuals PDF, the shape parameters of the inclination PDF is also influenced by the dormer type as well as its location and shape parameters. The dormer size plays a prominent role with regard to the roof surface. Changes in the dormer parameters are strikingly noticed in the heights of the distribution peaks.
Skewness, curvature, and smoothness are visually distinguishable and also reflected in the standard deviations of the distribution properties. A detailed listing is shown in Table 16.

| Distribution of mean squared error
The MSE is calculated with respect to the inclination angle and reflects the flatness of inclination in the point cloud. For this reason, similar observations related to inclinations have been noticed by the variation of the dormer parameters. In addition to the height of the maximum of the distribution, the position of the maximum is particularly influenced by different dormers.

| Distribution of α and β
The α angles are highly correlated with the object shape and size. Compared to the β angle, α angles are less sensitive to dormer locations on the roof. Again the heights of the PDFs are obviously influenced by the variation of the dormer parameters.

| Distribution of difference of normals
The DoNs are calculated from the inclination of a small (5 points) and a large (80 points) neighborhood of a point for a point density of 6.6 points/m 2 . The distributions of DoNs for roofs with superstructures differ most markedly from the distributions for roofs without superstructures. Again, the variation of dormer parameters has the most intense effect on the maxima of the distribution.
As yet, other methods do not address the impact of the presence of roof superstructures on the detection and distortion of the roof itself. This leads to unsatisfactory estimation of the roof models as stated in All in all, our approach is applicable independently of the roof complexity whenever the underlying model correctly describes the roof without superstructures (e.g. dormers). Smeared models due to the presence of superstructures represent in this sense the starting point of our method. In this context, the main limitation of our method lies in the assumption that a model corresponding to the roof in question without superstructures exists.
This means, for instance, that we would have worse results assuming a gabled roof for an underlying asymmetric gabled roof. Effects of wrong models are, however, beyond the scope of this article. A further limitation is that there should be enough observations for the identification and reconstruction of individual roof dormers.
The main focus of this article is the demonstrated method itself which has been explored and explained in an in-depth manner for the sake of replication based on a simulation environment in a first step. Detailed and various experiments have been performed to assess the impact of several parameters on the identification and reconstruction of roof superstructures, giving more insights and leading to important outcomes. The simulation based results have been confirmed using real data so far as annotation was possible. The separation of roofs and dormers was successful based on annotated roofs, from both simulated and real data, indicating the existence of dormers (or not). For the discrimination between dormer types and consequently their reconstruction, we rely on annotated dormers. In this context, we face great difficulties in correctly labeling a sufficient number of dormers of different types in order to confirm or falsify the good results achieved using the simulation data. Particularly, a clear labeling of the objects is not always possible due to the resolution of the photos; for example, shadows make it difficult to differentiate between roof structures, roof windows or similar objects.
The computation time is dominated by the classification part, using SVM for example. In general, the learning phase of the classifier, comprising the optimization of the SVM parameters (c and γ) can take several minutes for several building roof models. However, this is only needed once. Application of the classifier is linear, and can be done in a fraction of a second for several roofs. For the implementation of DBSCAN, the dominating factor is the ɛ-neighborhood graph which can be computed using a sweep-line algorithm in O(n log n+kn) without an index structure, where n is the number of input points and k is the average number of points contained in a squareshaped neighborhood of size 2ɛ×2ɛ centered around the points. Using a data structure (e.g., kd-tree) that supports queries for all points in the ɛ-neighborhood, the running time amounts to O(T setup (n) + nT query (n)). Likewise, the setup of the data structure is performed only once. All in all, in practice, the application of the learned model on a specific building is performed in a fraction of second. The experiments were performed on an Intel(R) Core(TM) i7-3770K processor. The machine is clocked at 3.5 GHz and has 16 GB RAM.

| CON CLUS I ON S AND OUTLOOK
This article introduced an approach to the automatic classification and reconstruction of roofs and their structures. A hierarchical classification method is followed which discriminates between roofs and dormers, followed by a classification of different dormer types. Both classification steps lead to very good results (up to 99%). The key idea is the use of designed features from PDFs of specific measures from the point cloud. PDFs of inclinations and the residuals of model-based generated roofs via RanSaC are used, among others. Even without residual information, a clustering using DBSCAN leads to the identification and reconstruction of dormers. The discrimination between dormer types turns out to be successful based on simulated data. Due to labeling difficulties of real objects, the application of the learned model to real data is open and will be the subject of future work.
The classification and reconstruction results are presented based on simulated and real data. The article also introduces an implemented simulation toolbox which allows us to generate different roof and dormer models with various and controlled parameters. For future work, additional roof and dormer types can be considered. Thus, the investigation of additional features is envisaged. In this context, in addition to normal distributions, the parameters of other PDFs can be taken into account.

ACK N OWLED G EM ENTS
The authors thank Dirk Dörschlag for the implementation of the laser scanning simulator. The authors are also grateful to Nazrin Gojayeva for performing some aspects of the experiments. We gratefully acknowledge the open datasets from Geobasis NRW.