UNCERTAINTY REPRESENTATION AND QUANTIFICATION OF 3D BUILDING MODELS

The quality of environmental perception is of great interest for localization tasks in autonomous systems. Maps, generated from the sensed information, are often used as additional spatial references in these applications. The quantification of the map uncertainties gives an insight into how reliable and complete the map is, avoiding the potential systematic deviation in pose estimation. Mapping 3D buildings in urban areas using Light detection and ranging (LiDAR) point clouds is a challenging task as it is often subject to uncertain error sources in the real world such as sensor noise and occlusions, which should be well represented in the 3D models for the downstream localization tasks. In this paper, we propose a method to model 3D building façades in complex urban scenes with uncertainty quantification, where the uncertainties of windows and façades are indicated in a probabilistic fashion. The potential locations of the missing objects (here: windows) are inferred by the available data and layout patterns with the Monte Carlo (MC) sampling approach. The proposed 3D building model and uncertainty measures are evaluated using the real-world LiDAR point clouds collected by Riegl Mobile Mapping System. The experimental results show that our uncertainty representation conveys the quality information of the estimated locations and shapes for the modelled map objects.


INTRODUCTION
Environmental perception is critical in autonomous drivingrelated studies. The received environmental information is stored as maps; they can be HD-Maps, acquired with high geometric and semantic accuracy, 2D truncated signed-distance field (TSDF) maps or 3D representations such as 3D-City models or even represented in terms of 3D point clouds directly. These maps serve as spatial references to an autonomous system and help the system to understand unknown urban environments in the localization and navigation tasks. In terms of 3D-City models, the automated reconstruction from laser scanning point clouds in urban scenes is widely studied in the geoinformatics and remote sensing fields. It is still a challenging task due to the complexity of the available 3D building facade layouts, missing data, e.g. due to occlusions, and the sensor noise in the collected LiDAR point clouds (Li et al., 2017). These problems inevitably lead to imperfections of the maps, with potential spatial or semantic uncertainties in the reconstructed city models. The uncertain impacts will also propagate to the downstream localization accuracy in autonomous driving scenarios, and affect the safety in navigation and collision avoidance. Thus, a proper measure or representation of the map uncertainties should be defined and help to convey the information about the map quality to localization applications.
The quantification of uncertainty is studied in many different research fields, such as simultaneous localization and mapping (SLAM), GNSS positioning and navigation, etc. It is often about point cloud registration or pose estimation of the egovehicle, but there is no sufficient exploration for the accuracy of the environment references, such as 3D city models and truncated signed-distance field (TSDF) maps. Ambiguous environment information, occlusions and noise in point clouds may occur as uncertainty sources in the reference map (Maken et al., 2021). In a 3D City model, the most critical components for autonomous systems are buildings. Although there exist plenty of methods for 3D building parsing or modelling (Li et al., 2017, Zhou and Gong, 2018, Xu and Stilla, 2021, the quality measure of the mapping process is absent. E.g., a 3D CityGML format usually does not contain quality measures, and only the LoD may give an indication of geometric quality. In the existing methods, the quality of the image point clouds can be assessed by the uncertainty of stereo matching, while the range-based LiDAR points can only have a general assessment according to the sensor errors (Xu and Stilla, 2021). The validation and evaluation results for the models are often not feasible to be used directly as uncertainty measures in applications. If the uncertainties of maps are neglected and maps are treated as perfect references, it is likely to impose systematic errors in pose estimation, e.g., deviation of the whole trajectory.
To tackle this problem, the uncertainty and completeness of a map based on LiDAR point clouds can be described in a probabilistic fashion. Probabilistic methods capture different uncertainty sources and provide the probability distribution of the potential spatial locations and/or semantic labels of the map objects. Probabilistic approaches are already applied in plenty of uncertainty research (Di et al., 2021, Feng et al., 2021, Paz et al., 2020. For example, Di et al., (2021) estimated Gaussian Process (GP) posterior in incremental multi-robot mapping, where the regression is over a 2D TSDF map. For semantic labels, Feng et al., (2021) inferred the uncertainty in bounding box labels of object detection and defined a new representation of the probabilistic bounding box through a spatial uncertainty distribution. Paz et al., (2020) constructed a probabilistic semantic map in bird's eye views in urban driving environments. These methods either focus on 2D maps or semantic object labels. A detailed spatial uncertainty representation of 3D maps for urban scenes is still of great interest to explore. This paper proposes a method to generate 3D city maps with probabilistic uncertainty representations, which can be feasibly validated and incrementally updated with new measurements. The goal is not a "perfect" building model, but a representation of the geo-objects in terms of elements and their uncertainty measures, including unknown occluded parts. In this work, we focus on the uncertainty of 3D building models, especially the locations and orientations of the façade planes and windowsthe most critical map objects for localization tasks. In particular, the uncertainties of windows are modelled with likelihood distributions and the inference of the occluded parts is also approximated. Therefore, a 3D building model with the optimal estimates of the façade positions and orientations, as well as the window existence is generated as our environment representation. The locations and shapes of the map objects will be mapped with the probability of their potential spatial distributions. For the façade modelling, the parametric Gaussian Mixture Model (GMM) approach and the non-parametric Gaussian Process (GP) approach are both studied and compared. The uncertain sources like occlusions and measurement noise are included in the uncertainty representation. Similar to the previous work (Zou and Sester, 2021), also the shape of occlusions (i.e. unknown object regions) will be computed and stored as additional information. The detection confidence and the ambiguities of the windows are modelled with a logistic function-based method. Furthermore, the occluded regions are also inferred with available information using Monte Carlo (MC) methods.
The rest of the paper is structured as follows; In Section 2, the overall workflow is briefly introduced, and we demonstrate the details of the façade and window modelling with uncertainty measures. This is followed by experimental results and the evaluation presented in Section 3. Finally, in Section 4, we present our conclusions and the outlook for future work.

Overview
The proposed uncertainty representation focuses on the facades and windows in 3D building models based on the LiDAR point clouds. To build up 3D building models, the overall workflow contains pre-processing, façade segmentation, local frame transformation, and detailed façade elements modelling, similar to the previous work (Zou and Sester, 2021). In pre-processing, the alignment of the point clouds, pre-classification and individual building segmentation are conducted on the raw LiDAR point clouds. The segmentation of the façades is then performed by the RANSAC algorithm. For each façade, a local 2.5D frame based on the orientation of the major façade plane is constructed for the points in this cluster. In the local frame, the façade depth model and window uncertainty model will be estimated.

Facade Model
To determine the façade orientation and the local frame, Principle Component Analysis (PCA) is utilized to extract the normal of the plane. We treat the third component of the PCA result as the orientation of the façade, i.e. it is the direction of the depth and the basis of the local frame we consider for the subsequent processing. Note that the normal vector of the plane obtained from PCA will be set to the direction pointing towards the sensor, by comparing with the average normal values of all points measured on the façade. The red Gaussian has a higher prior probability π1 than the blue one π2. Note that there are some other small depth planes, such as the one where window casings are located. They are not all shown here.

Gaussian Mixture Model (GMM):
On each façade, GMM is used to decompose and cluster the façade points to different depth layers. i.e., a depth plane is treated as a Gaussian distribution with a certain weight in GMM. The mathematical equation of a GMM reads: where d is the depth value of a point, K is the number of Gaussian components; θi = {πi, µi, σi} is a set of GMM parameters for the i-th depth layer -πi is the prior probability or weight, meaning the importance of the component; µi and σi are the mean and standard deviation of the Gaussian distribution. The optimal parameters of GMM are derived by maximizing the log-likelihood, as shown in the equation: where L(θ|D) is the likelihood and N is the number of data points. In practice, to overcome the analytically unsolvable problem in the mixture model, Expectation Maximization (EM) algorithm is applied. The details are omitted here for brevity.
The depth layers are characterized by the trained GMM, and the points are assigned to different Gaussian components according to their depth values, as shown in Figure 1. The point set in each Gaussian component is supposed to be located on the same depth plane, whose shape constraints are extracted by the alphashape algorithm. In this way, the building façade can be represented by the GMM parameters and the boundary points of each planar patch. The expected position of the façade plane as well as the potential uncertainty are represented by the mean value µi and the standard deviation σi. The random sensor noise is captured by the standard deviation of each Gaussian component. The undesired effects of the outliers and inhomogeneous density of the point cloud can be suppressed in the estimation process.
The Gaussian component with the highest prior probability πi is the principal façade plane with the majority of points. The local frame is then refined by PCA results only on the point set of the principal plane. These points are selected by the depth values inside the 99.7% confidence interval, i.e. 3σ: where µm, σm are the expectation and standard deviation of the principal Gaussian component. When the distance between the point and the principal façade plane is larger than 3 * σm, the points will not count for the computation of the principal façade orientation.
2.2.2 Gaussian Process (GP): As illustrated above, the GMM is a parametric approach, depicting the dominant planar parts of the building façades. Since there remain some nonplanar or irregular elements, a non-parametric approach can be applied to better model buildings. In this work, the façade surface and uncertainty modelling using a GP approach is introduced. GP is a popular non-parametric method for non-linear modelling, describing a joint Gaussian distribution over a continuous variable domain. Here, the GP is defined over the space domain; the depth value of each location is a random variable, with its own mean and standard deviation, and the correlation of the depth variables over two different locations is indicated by their covariance. Thus, a GP is specified by two key components, i.e. its mean function µ(x) and covariance (kernel) function k(x, x ).
According to the marginalization and conditional properties of the multivariate Gaussian distribution, the posterior distribution of the façade surface model can be obtained with the prior assumption and the training data. With the prior probability as , if the noisy measurements, y = f (x)+ε, are given as the training samples {X, y}, the posterior mean and covariance of a predicting case {x * , y * } can be derived as (Rasmussen, 2003): are the prior covariance between the N input training points and the predicting points. The Gaussian kernel function is used here as the prior covariance function k0(x, x ). σ 2 η denotes the uncertainty introduced by the random measurement noise ε ∼ N (0, σ 2 η ).
In this way, the posterior mean function depicts the expectation of our façade surface while the corresponding variance represents the uncertainty of the estimation at each location. Figure 2 shows a 1D example for the surface expectation and the uncertainty bound (95% confidence interval). Compared to GMM, the GP approach is more flexible with fewer geometrical constraints. This property facilitates a more accurate description of irregular surfaces and small depth differences. Nevertheless, it yields a weakness in modelling explicit regularities. The results of the GP and GMM methods will be further evaluated and discussed in Section 3. Since LiDAR points are discretely distributed in the 3D space, the density of the points is another measure to evaluate the integrity of the plane estimation. GP can capture the density effect in its uncertainty representation. If we assume a wall is a watertight plane, most of the windows appear as holes on the plane. When a region is measured with very sparse points, it may seem like a hole as well. Thus, the density is also closely related to the window modelling and will be analyzed implicitly in the window modelling process in the next subsection.

Window Model
Window detection and modelling have been explored by many existing approaches (Brenner and Ripperda, 2006, Nguatem et al., 2014, Tuttas and Stilla, 2011, Mesolongitis and Stamos, 2012, Li et al., 2017, covering the research focus such as the window location estimation, shape reconstruction and window pattern analysis. However, the uncertainties of these estimates are not provided, e.g. how accurate the window edge is or how confident we are in the detection results. In this paper, the window detection results are firstly specified with initial detection confidence, which will then be elaborated by the probabilistic uncertainty model -a combination of several logistic functions. It provides the likelihood distributions for the possible window areas, quantifying the uncertainties of the windows, i.e., the detection confidence, uncertain shapes and vague boundaries. Also, with the analysis of the window patterns, occluded or non-measured windows are inferred by the Monte Carlo method, and can be updated and refined once the new measurements are available. Note that the potential update is based on our assumption that the data of a 3D city model can be incrementally acquired and updated.

Window Detection:
A window object is estimated with the most probable polygon shape (assumption: rectangles), as well as a likelihood distribution for potential window space. The first step is the window detection over the space. A holebased detection method is applied here, as the windows usually appear as holes on façades in the LiDAR data. The horizontal and vertical lines of the hole boundaries are extracted and comprise several candidate rectangles, as illustrated in Figure 3a.
For each candidate, initial detection confidence is influenced by the following factors: the number of undesired façade points inside the rectangle (Figure 3b), the projection of the ceiling point (laser ray penetrates the window and hits on the ceiling) on the vertical façade plane as shown in Figure 3c, and the weighted intersection over union (IoU) of the hole and the assumed rectangle; the ceiling point projection increases the weight of the projected area. The confidence is calculated by: where N (∩,weighted) is the weighted area of intersection of the candidate rectangle and the hole, while N (∪,weighted) is the weighted area of union. N f is the undesired area with façade points inside the rectangle. It will be affected by the density of façade points. The candidate with the highest detection confidence is selected as the detected window shape.

Window Uncertainty Quantification:
The detection confidence of a window is a rather simple measure for the entire window object, not considering the inhomogeneous uncertainty of the window space, e.g. we are less confident in the borders between the window and wall than the more central window area. To have an insight into the continuous spatial uncertainty of whether the space is a part of a window or not, the uncertainty of the window object can be further quantified with the probability based on the detection results. This is achieved by the combined model of four logistic functions, with the assumption that the edges of a window have higher uncertainty than the center parts; the uncertainty does not change much inside a window. In each direction (horizontal or vertical), the combination of two logistic distributions is computed individually and then integrated with the other, as the uncertainties of the horizontal edges are independent of the vertical ones. The mathematical equation of the logistic distributions in either horizontal or vertical direction reads as follows: where F (x; β, γ) is the logistic distribution model, β is the sigmoid's midpoint, denoting the position, γ is the logistic growth rate or the steepness of the curve, indicating the drastic uncertainty change around the borders in this case. L(x) is the combination of two logistic distributions, with the prior confidence value c derived from the previous window detection step. The difference |α1 − α2| between the α values of two logistic distributions denotes the width or height information of the corresponding window.
In Figure 4a, an instance of the horizontal likelihood distribution for the window area is shown, where the steep curves denote the uncertainty of the window borders, as more ambiguity of being the window hole or the watertight façade surface is present here. In contrast, the likelihood does not change much in the middle part, with the value specified as the initial detection confidence. The horizontal and vertical distributions will be multiplied to construct a 2D window likelihood distribution, as shown in Figure 4b. Figure 4c and 4d demonstrate the overall detected windows and the corresponding uncertainty models, respectively. Redder colors denote a higher probability of being a window area. In the window distribution model, the width, height, and location information are represented by the parameters of the logistic distributions.

Layout Pattern Analysis:
The initial hole-based detection is not perfect, as some windows with curtains do not appear as holes. Furthermore, it is also difficult to detect the windows in occluded areas. Therefore, the window layout can be exploited to infer the missing windows from the existing data. As a man-made structure, the layout of windows usually has a certain repetitive or symmetric pattern, which can be captured by the autocorrelation analysis in the horizontal and vertical directions. Note that the autocorrelation result will be checked to confirm the presence of the global repetitive pattern. i.e. the frequency is in a reasonable range; the repetitive period should not be larger than 1/3 of the entire facade width or height and should not be smaller than 20 cm; the expected interval obtained from autocorrelation should not be rejected by the statistical test concerning the actual distances of two adjacent windows. Otherwise, the repetitive pattern will not be detected. If the repetitive frequency is successfully analyzed by autocorrelation, the output will be the distance expectation between two adjacent windows in the horizontal and/or vertical direction. The standard deviation of the window intervals is obtained based on the differences between the interval expectation and the actual distances of two adjacent windows. Thus, the window layout pattern provides information about the potential location of the neighboring window, quantified by the probability, as illustrated in Figure 5. The conditional probability reads as: p(yw|y0, xw = x0) = N (yw|y0 ± µy, σy), where (x0, y0) is the coordinate of a given window center, the point (xw, yw) is the center of a neighboring window, which can be located on the left/right/up/down side. {µx, σx, µy, σy} are the distance expectation and standard deviation in the horizontal and vertical directions. Figure 5. The probabilistic distribution of the distance to the neighboring window on the right.

MC Inference:
Considering the obtained window distributions as prior knowledge of the window presence, an inference of the missing windows can be performed, with the analyzed window pattern. The MC approach and a "particle" idea are employed here to calculate the posterior probability of being a window area. A window distribution characterized by the parameters of logistic functions is seen as a *particle* with the features θw = {x, y, c, w, h, γ1, γ2, γ3, γ4}, where {x, y} are the position coordinates of the window center, c is the initial confidence, {w, h} are the width and height, and {γ1, γ2, γ3, γ4} are the growth rates of four logistic functions in horizontal (left/right) and vertical (up/down) directions. Sampling a window particle from the prior distribution, the location probability of a new window particle conditioned on the sampled window particle is used for inferring the posterior probability of the overall window distribution with the MC approach. Provided with new measurements, the distributions can be validated and refined. The unweighted window "particles" will be updated with weights according to the new data -window areas measured with new facade points get smaller weights. The new window detection results from the new data will also be integrated with the existing weighted window particles. The outputs of the update or correction are a set of weighted "particles". Given window patterns, a MC inference can be applied to the updated outputs again. Figure 6 is an example of a façade with large occlusions, where the windows shadowed by the occlusions or with curtains are not detected. With the repetitive pattern, they are inferred with the corresponding likelihood. Figure 6a shows the resulting window distributions while Figure 6b presents the inference, where the redder or warmer color suggests a higher probability to be windows. Therefore, the missing windows in occluded areas can be inferred with certain likelihood. Given a specific likelihood threshold, the points with the likelihood at the threshold value compose the new shape boundary.

EXPERIMENTS
To evaluate the proposed uncertainty representation of 3D building models, comprehensive experiments are conducted on the 3D Lidar datasets collected from the Riegl Mobile Mapping System VMX 250 in Hannover, Germany. In a pre-processing step, the LiDAR point clouds are aligned and classified into interested geo-objects and only building points are investigated.
In the experiments, the noise σ 2 η is set to the point cloud alignment noise 2 cm. The hyper-parameters in the Gaussian kernel of GP are learned by log-likelihood maximization. Table 1 compares the modelling errors as well as the fitting goodness of the uncertainty representations for GMM and GP. Root Mean Square Error (RMSE) is used to evaluate the differences between the estimated surface depth and the test data. The percentage of the points inside the 95% confidence interval and the log-likelihood values are utilized for evaluating the goodness of the uncertainty fitting. Log-likelihood is a relative measure for the uncertainty evaluation and can only be compared for the same case, as it is affected by the number of testing points. The larger log-likelihood values indicate better fitness of the uncertainty. As shown in the table, GP has smaller RMSEs than GMM. In general, the uncertainty quantification of GP is also better than GMM. However, GP as a non-parametric approach needs much more storage and computational cost, which grow drastically with increasing training data. E.g. for a façade with around 20 000 points, the prediction time of GP is around 10 ms per testing point while it is 10 −2 ms for GMM. GP is also more sensitive to outliers than GMM. The two parametric and non-parametric methods can be combined to yield better mapping accuracy in the future. Figure 7 shows qualitative results of the modelled façades, where different colors (blue to red) denote the depth information. Warmer colors represent protrusions while colder colors denote extrusions. Note that in building B, only one depth plane is extracted with the GMM-based method due to some small elements with only a few LiDAR points but various distances to the major plane. They are modelled improperly with one GMM component, with a large standard deviation and a mean value quite close to the major plane, making it hard to distinguish from the major plane. There are five and two depth layers segmented in Building A and C, respectively. As the GP approach treats the façade as a watertight surface, the holes including occlusions and windows are modelled with depth values, with large uncertainty. This can be further processed by the window uncertainty models to determine the information about the existence of the windows.
Window areas are detected and distinguished from the occluded areas with the proposed method in Section 2. Façades with large occlusions are investigated and the detected prior window shapes and the quantified uncertainties are shown in Figure 8. The higher likelihood is denoted by warmer/redder colors. As shown, the uncertain ambiguous window boundaries are modelled by lower likelihood values. The prior window detection results are strongly affected by the occlusions and point cloud density. As mentioned in Section 2, if the repetitive pattern is recognized, a MC inference based on the window layout is implemented. The results of facades with typical repetitive patterns are shown in Figure 8. Note that repetitive patterns are not detected in all the examples in Figure 8. E.g., only a vertical repetitive pattern is recognized for Figure 8d, while there is no repetitive pattern recognized in the example of Figure 8e, thus it is also not represented in the third row of Figure 8. The occluded windows are inferred with higher likelihood if there are more neighboring windows -e.g. as clearly visible in the second window of the third row in Figure 8k; still, in the other rows, the occluded windows are hypothesized and shown with different shades of blue.

CONCLUSIONS
In this paper, the 3D building modelling with the quantification of the uncertainty for façades and windows is demonstrated.
The uncertainty measures are designed and investigated with probabilistic methods. GMM and GP are utilized for modelling the building façade surfaces, where their own advantages and drawbacks are illustrated with the experimental results and uncertainty evaluation. Compared to the parametric GMM, GP is more flexible with a better mapping accuracy but more sensitive to the outliers and it requires expensive computation cost. Also, the uncertainty quantification for the window existence and shapes based on logistic models is proposed, where both the global detection confidence and local uncertainty change of a window object are represented. The potential inference of the occluded window areas is performed by the MC approach.
In the future, the GP and GMM methods can be combined to model the façade to obtain an efficient and accurate map. This can be achieved by using the GMM for planar parts of the facade, and the GP for elements on the facade, which cannot be modelled by a simple plane. To solve the expensive storage and computation problems of the GP approach, low-rank approximations and sparse techniques can be applied. For a window model, since the prior window detection results are hole-based, the detection is sensitive to occlusions and the point density.
A more elegant learning-based window detection approach can replace the current detection method as the prior window information. The MC approach relies on repetitive structures of the windows. Further investigations will be conducted to also allow different kinds of regularities. Furthermore, in the future, the use of the models including their uncertainties for localization tasks will be discussed and evaluated.