Quality assessment of a nationwide data set containing automatically reconstructed 3d building models

: Fully automated reconstruction of high-detail building models on a national scale is challenging. It raises a set of problems that are seldom found when processing smaller areas, single cities. Often there is no reference, ground truth available to evaluate the quality of the reconstructed models. Therefore, only relative quality metrics are computed, comparing the models to the source data sets. In the paper we present a set of relative quality metrics that we use for assessing the quality of 3D building models, that were reconstructed in a fully automated process, in Levels of Detail 1.2, 1.3, 2.2 for the whole of the Netherlands. The source data sets for the reconstruction are the Dutch Building and Address Register (BAG) and the National Height Model (AHN). The quality assessment is done by comparing the building models to these two data sources. The work presented in this paper lays the foundation for future research on the quality control and management of automated building reconstruction. Additionally, it serves as an important step in our ongoing effort for a fully automated building reconstruction method of high-detail, high-quality models.


INTRODUCTION
3D models of buildings are increasingly used in urban applications and these models are mostly reconstructed fully automatically, specifically when it comes to reconstruction of building models for large areas.The number of reconstructed models easily adds up to a total that can no longer be visually assessed (i.e.> 1 million buildings).In addition, 3D reconstruction of large areas might contain situations that have not been accounted for when the reconstruction algorithm was developed (because similar situations were not in the test area).At the same time, understanding the quality of the reconstructed models is important to improve the reconstruction process (i.e. to adjust it to specific cases that were not handled before), to provide the user with fit-forpurpose information for her/his application as well as to highlight unacceptable models (i.e.too low quality) that need manual improvement.The factors impacting the quality of automatically 3D reconstructed models can be various, e.g. the quality of the input data, geometrical and temporal consistency between different input data (e.g. point clouds and footprints), the reconstruction algorithm itself etc.To gain insight into the quality aspects of automatically reconstructed 3D building models, we have performed a quality assessment on a dataset containing automatically reconstructed building models at different Levels of Detail (LoD) of all 10 million buildings in the Netherlands, called 3D BAG.The work presented in this paper lays the foundation for future research on the quality control and management of automated building reconstruction.Additionally, it serves as an important step in our ongoing effort for a fully automated building reconstruction method of high-detail, high-quality models.

Overview of this paper
We start with previous work on the quality of 3D city models in Section 2. After we have summarised the reconstruction process of the data of study in Section 3, we highlight two specific cases for which we have developed specific reconstruction solutions in Section 4. These cases were identified during a quality analysis that we performed on earlier versions of the data.Section 5 presents the quality indicators of our assessment method.The results of the quality assessment using these indicators are presented in Section 6. Section 7 closes this paper with conclusions.

PREVIOUS WORK
Quality of 3D city models has been studied by other scholars.Krämer et al. (2007) presents a quality model that defines spatial quality measures for 3D city models.This model includes the reality, the user's perception and the digital data set.They present a formal definition for the different quality parameters.These definitions can be used to develop algorithms for the measurement and improvement of spatial data quality of 3D city models.The quality parameters they define are positional accuracy, completeness of both objects and attributes, semantic accuracy, correctness of attributes, temporal conformance, and logical consistency (geometrical, topological, semantics, format).They developed a prototype for two of these parameters, i.e. the completeness of objects, for which they use the ground plans from the land registry office as ground truth, and positional accuracy.Akca et al. (2010) developed a method for the quality analysis of 3D city models.The method compares the reconstructed building models with the original input data using the Least Squares 3D surface matching method.The matching evaluates the Euclidean distances from the LiDAR points to the corresponding 3D building mesh.They also perform a full LS3D surface matching.This shows the reference system accuracy of the building models with respect to the coordinate  (2011) and presents a method for using the reconstruction ALS data as a reference in the reconstruction quality evaluation.Their goal is to develop a method that is feasible for the countrywide assessment of 3D building model quality.Common to the previously mentioned studies, Ostrowski et al. (2018) computes the orthogonal distance between the ALS points and the model, and uses the derived statistics to classify the models into three quality categories.The three categories are based on the predefined national quality requirements.
The aim of the OGC CityGML Quality Interoperability Experiment (QIE) was to define a unified method for the validation of 3D City Models (Wagner and Ledoux, 2016).The result of this project was the specification of a set of validation rules that can be used to validate CityGML models as well as conformance requirements as defined in the CityGML standard.Coors et al. (2020) proposed an approach to specify application-specific requirements for 3D city models encoded in CityGML files.They used the set as defined in the OGC CityGML Quality Interoperability Experiment to specify application-specific sets of requirements in the form of a formal definition of a validation plan.Apart from schema validation, their work focuses on validity rules of geometries with respect to applications as well as on application specific attributes such as function and year of construction.These last attributes are user-specific attributes added to the CityGML data model.The application-specific sets can be used to develop algorithms to perform the checks of the validation plan.
Our quality indicators are similar to the ones proposed in these previous studies.The novelty in our approach is that we evaluate these indicators as part of the reconstruction process and assign this information as metadata to the reconstructed models.In addition, we use these indicators to evaluate the quality of a nationwide 3D dataset at multiple LoD to obtain insights into quality issues of an automatically reconstructed dataset, including all exceptional cases that can occur.

RECONSTRUCTION PROCESS
The dataset in our study was generated from building polygons maintained in the Building and Address Register (BAG, 2021) and an ALS point cloud of the National Heigh Model (AHN, 2021), both with complete coverage of the Netherlands.
In one reconstruction process several models at different levels of detail are generated for each building to serve different applications and user needs, see Figure 1.We adopted the refined LoD framework of Biljecki et al. (2016) to specify the models.
The LoDs that are generated in our process for all 10 million buildings of The Netherlands are: 1. LoD1.2 building models, extruded from the original building polygon to a single height.LoD1.1 (and LoD2.1) models are based on generalised and aggregated buildings and therefore outside the scope of our research.2. LoD2.2 building models with detailed roof shapes based on building polygons, again containing small building parts and extensions as available in the original polygons (as for LoD1.2), 3. LoD1.3 models, models where a building with clear height jumps (i.e. a church with a tower; a house with a shed attached) is extruded to those different heights.The LoD2.2 buildings are reconstructed via a two step approach.In the first step the original building footprint is fragmented based on the identified roof planes above it; in a second step the footprint is extruded to the detected roof planes into a 3D model.The LoD1.3 buildings are reconstructed as a generalisation of the LoD2.2 buildings, where each footprint fragment is assigned a fixed reference height and neighbouring fragments with similar heights (less than 3 m difference) are merged.Finally, the LoD1.2 models are simply created as an extrusion of the complete building footprint to a reference height.The LoD1.2 and LoD1.3 models are also available as 2D footprints with the height information assigned to it as attributes.These 2D outputs contain several reference heights (minimum, 50th and 70th percentile and maximum).This provides the user the possibility to use the reference height for extrusion that fits best to the needs of her/his application.For the ground height of each representation, the 5th percentile is used for all ground points within a 4 m buffer of the building footprint.More details can be found in

IMPROVING QUALITY BY ADDRESSING SPECIFIC CASES
In a quality analysis of an earlier version of the reconstructed data, we have identified cases which the reconstruction process did not yet account for.As an example of such cases, this section describes two of such cases, and how we handled them in the improved reconstruction process.These are underground parts (Section 4.1) and greenhouses (Section 4.2).

Underground parts
A specific problem in the reconstruction of buildings from the BAG data, is that the BAG geometry represents the outline of a building as seen from above.This BAG representation does not distinguish between parts that are above the ground and parts that are below the ground.Therefore, for buildings that have cellars extending the footprint or (part of) buildings that represent underground parking garages, the reconstructed models do not correctly represent the real building, i.e., underground parts are incorrectly extruded.To improve the reconstruction method for such cases (Figure 2), we have developed a method to identify different types of multi-level buildings.
BAG-building completely underground (metro station) Part of BAG-building is underground (underground parking garage) BAG-building above a road BAG-building above another BAG-building We distinguish three possible situations: 1. Building or building part is on the ground 2. Building or building part is underground, e.g.metro station 3. Floating building, e.g.standing on pillars on top of a road, water or another building or overhanging a road.
In a pre-process we identify the first two types of buildings by comparing the BAG-polygon with the alpha-shape of building points in the LiDAR point cloud data set.Buildings in the third category are identified by detecting overlaps between buildings, and road or water (both obtained from the countrywide large-scale topography dataset) or other BAGbuildings.
Buildings that are completely underground (second category) are excluded from the reconstruction process.Buildings that are partially underground (e.g. a parking garage extending the footprint of the building it belongs to) are handled together with the 'normal' buildings and their underground parts are detected and cut off in the LoD1.3/2.2 reconstruction (by checking if a part only contains ground points and no roof points).These cut off parts are still outputted and labelled accordingly.

Greenhouses
In initial experiments, we noticed that the quality of point clouds on greenhouses are rather poor (see Figure 3), because the LiDAR beams penetrate the glass roofs and therefore contain a mix of ground and building points, while some of the laser beams are not reflected due to the mirroring effect of the glass.Consequently, large parts of the roof are missing in the point clouds, while points on the ground are included.In addition, the greenhouses (as warehouses) are very large.This makes the plane fitting process extremely complex and prone to errors as well as time consuming.Therefore, we have removed greenhouses and warehouses from the reconstruction process and limit their 3D reconstruction to simple extrusions from the input polygons.The type of building is available in the 1:10k topographical dataset of Kadaster that we use to identify and filter such buildings.

QUALITY INDICATORS OF OUR ASSESSMENT
To assess the quality of the resulting models in relation to the input data and the reconstruction process, we calculate different quality measures during the process and assign these to individual models.These quality measures are described in this section.How we use these measures to perform a quality assessment of the reconstructed dataset is also described.
The building models are reconstructed by combining two data sources, i.e. building polygons and aerial laser scanning data.Therefore, any positional error in them propagates into the building models, and the positional accuracy of the reconstructed models is limited by the accuracy of the input data.We do not assess the positional accuracy of the input data as it is well documented for both input data sets.

Quality of the input point cloud
For the reconstruction, we use airborne LiDAR data.Due to the sensing technique and the observation from the air, the point cloud may have low quality at some locations due to different factors, for example the scan angles, water on roofs, glass roofs (See 4.2), incorrect classification of the LiDAR points, a laser beam that cannot reach the building because of occlusion etc.The impact of areas without laser data is also described by Oude Elberink ( 2010).These no-data areas are calculated as the difference between the footprints area and the area covered by laser points (based on alpha shape of the points).The no-data areas are normalised by the footprint area.As a consequence of the no-data areas, the number of points available for the reconstruction as well as the statistical calculation per building polygon or roof-part can vary from one to hundreds or thousands of points.This number can highly influence the quality of the reconstructed model and the amount of detail that can be modeled (Oude Elberink and Vosselman ( 2011)).Therefore, we also calculate the net point density, which excludes the no-data areas.
To evaluate the quality of the input point cloud, we calculate the number of points that were available for each point segment.This is an absolute measure, which helps to identify cases where there were less than six points available in the segment.
Oude Elberink and Vosselman (2011) counts the number of point segments that were not used for creating the roof model, where a high number of unused segments can indicate a faulty reconstruction.Similarly, we calculate the number of points that were not assigned to any segment at all, and report this number relative to the total number of points for the model.

Temporal mismatch between point cloud and building footprint
The temporal conformance (Krämer et al. (2007)) of the models is measured by comparing the acquisition date of the input point cloud and the recorded construction date of the building polygon.If the building polygon is newer than the point cloud, no 3D model can be reconstructed.
The currently available national point cloud (AHN3) only records the acquisition year for a given region and does not contain the GPS timestamp in the point data.Additionally, the construction time of the buildings is registered with a temporal resolution of one year.We therefore obtain three values for the timeliness and assign these to the models:  yes -the point cloud was collected in the years after the construction date of the building and the building model can be reconstructed from the point cloud. no -the point cloud was collected in the years before the construction date of the building; therefore no points are available for the specific building  uncertain -the point cloud was collected in the same year as the year of the construction of the building.The buildings that are newer are excluded from the reconstruction process, although the polygons are provided in the reconstructed data and labelled accordingly.

The model fit
Contrary to the methods proposed by Krämer et al. (2007) and Akca et al. (2010) we have no means to assess the absolute positional accuracy of the models, as there is no ground truth available.On the other hand, we establish a relative measure of correctness, using the point cloud as reference, from which the models were reconstructed.Such a method evaluates the fit of the model to the input data, by calculating the orthogonal distance from each point to the roof planes, and is also applied by Dorninger and Pfeifer (2008), Oude Elberink and Vosselman (2011), Ostrowski et al. (2018).Additionally, we calculate the distance from each vertex of the model to the nearest point in the point cloud.Similarly, to Oude Elberink and Vosselman (2011), we use this measure to check whether the model vertices are within a certain distance of the point cloud.

Geometric validity
The logical consistency (Krämer et al. (2007)) of the models is measured by performing a series of 3D validity checks.The 3D model geometry must conform to the requirements described by Ledoux (2013;2018), which follows the international standard ISO19107.Geometry validation is integrated into the reconstruction process, where each reconstructed model is tested, and the invalids are assigned the error code indicating the type of error.The 32 error codes are specific to the validation tool, val3dity, developed by Ledoux (2018) and described in detail in the tool's documentation (val3dity, 2021).

Roof complexity
We obtain this indicator, because we assume that the complexity of the roofs has a significant influence on the quality of the reconstructed model.The complexity is identified via the number of each type of roof planes detected for each building.We identify these roof types (slanted or horizontal) via the angle of the roof planes and the number of levels in the roof structure.

ASSESSING THE BUILDING MODELS
For practical reasons the assessment below was conducted on a 1% sample of the complete set of nearly 10 million models.This sample contains about 100000 models in each LoD, sampled randomly from across the Netherlands.The complete set was generated in March 2021, and carries the version number v21031_7425c21b.Therefore, it is important to note that the assessment results are indicative to the mentioned data version, and the status of the reconstruction method in pre-March 2021.As the method is under continuous development, it is expected that future data versions will have an improved quality.For the point cloud assessment, we analyse the spatial variation of quality in the point cloud by analysing these indicators over specific regions (Figure 5), since the height data was collected and processed in different measurement campaigns by different companies.The quality metrics calculated for the whole sample are the following:  Median no-data area: 13% of the polygon area  Median point density: 15.3 pts/m2 (within a polygon)  Median unsegmented points: 3% of the total points (within a polygon) Figure 6 compares the three point cloud metrics relative to each other and relative between regions.However, as can be observed from these numbers, we could not identify a pattern when comparing the point cloud quality measures across the acquisition regions.Figure 6.Median point cloud statistics per acquisition region (Figure 5).The values are relative and transformed so that a higher value means better properties.
The temporal mismatch is assessed by analysing the occurrences of the buildings that are newer than the input point cloud, the buildings that are up to date and the buildings for which the timeliness is uncertain.In the 3D data set reconstructed in March 2021, these values are respectively 4% (newer), 95% (up to date), and 1% (uncertain).
Figure 7 shows the buildings that were built after the acquisition of the point cloud for an area around Rotterdam.
There are both individual constructions and developments of complete neighbourhoods, as it is indicated by the tight clusters of building polygons.This pattern is more pronounced around the coast of the Netherlands compared to the inland territories, where the urbanisation is sparser.For the model to point cloud distances we measure their maximum values, because they can highlight a certain error that we observed in our current reconstruction method.We call this type of error "screens", as it is commonly represented by a wall-like, thin, protruding surface from the expected geometry of the model.This type of reconstruction error is likely to be caused by long, narrow no-data areas in the point cloud (section 5.1), where the resulting roof plane is matched to an incorrect elevation.Among the LoD1.2 models less than 1% has invalid geometry, in the LoD1.3 models around 2% has invalid geometry, in the LoD2.2 models around 10% has invalid geometry.By far the most common geometric error among all three reconstructed LoD-s is the unclosed shell (code 302).This error makes up around 91% of the invalid cases.Following is the error of self-intersection in one of the rings in 2D, which makes up around 3-6% of the invalid cases.The rest of the errors are not listed here, because they represent an insignificant fraction of the total cases.
Based on this quality assessment of the automatically reconstructed models for all buildings in The Netherland, improvements for increasing the geometric validity of the models are currently in progress.Therefore, future data version will have lower number of invalid cases, especially in LoD2.2.In the upcoming release the geometric validity of LoD2.2 models has increased up to 98%.
The roof types per building that we identify are listed below, together with their frequency among the models.Figure 10 shows the three reconstructed roof types that we identify. Roof with at least one slanted surface (64,8%), i.e. complex building.


No points was found for the building (1,5%), most likely an underground building.


Could not detect a roof surface, even though points were found (0,3%).

CONCLUSION
In this paper we presented our quality assessment methodology that we have set up and followed to perform a quality assessment of an automatically reconstructed data set containing building models for all 10 million buildings in the Netherlands, called 3D BAG.The assessment results provide insight in the quality of the reconstructed models.
The assessed data set was generated in March 2021, and carries the version number v21031_7425c21b.Therefore, the assessment results are indicative to the mentioned data version, and the status of the reconstruction method in pre-March 2021.This is the first time that our automated reconstruction method has been applied to all buildings in The Netherlands, to generate a multi-LoD data set, including LoD2.2.The quality assessment is vital in the continuous improvement of the reconstruction method.Therefore, future data versions will have an improved quality.In the upcoming release the geometric validity of LoD2.2 models has increased up to 98%.
Besides improving the reconstruction process in further development, the quality assessment can be used to provide the user with fit-for-purpose information for her/his application as well as to highlight low quality models that need manual improvement.In addition, it gives insight into how the quality of the used input data (i.e. point clouds and building polygons) has an impact on the quality of the reconstructed models.These insights can be used when our reconstruction method is applied to other data sources (i.e. point clouds obtained from images) or to other countries (where other source data is available).
In general we measured a relatively good fit between the models and the point cloud in LoD2.2, as exemplified by the 0.04m RMS.However, we observed that measuring the orthogonal distances between the point cloud and the model is in itself is insufficient to identify models with an incorrect reconstruction (see Figure 9).In this regard we came to a similar conclusion as Oude Elberink and Vosselman (2011).
Reliable and automated identification of incorrect models remains a future challenge.
Our initial assumption was that the quality measures point coverage, point density, unsegmented point coverage have a significant impact on the reconstruction quality, as it can be measured by the RMS of the point-to-model distances.
However, in our research we did not recognize a correlation and the relation between these parameters is a topic of future work.
data for noise simulation.

Figure 1 .
Figure 1.Excerpts of the reconstructed dataset at three LoDs.From left to right: LoD1.2,LoD1.3, LoD2.2 Dukai et al. (2019;2020),Stoter et al. (2020) andPeters at al (2021).We have reconstructed these models for all 10 million buildings in the Netherlands.The building models can be viewed and downloaded as open data via an open source 3D viewer (3D BAG, 2021).The data set is the object of study for our quality assessment as presented in this paper.

Figure 3 .
Figure 3. LoD2.2 reconstruction of a greenhouse is difficult a) AHN3 point cloud (green: surface points; brown: building points).b) 2.5D height surface with maximum height for each pixel c,e) LoD2.2 reconstruction result.d) aerial image and BAG-polygon.

Figure 4 .
Figure 4. Missing points (no-data) in the point cloud.The church tower occluded part of the roof from the laser beams.The impact on the reconstruction can be seen in the yellow model.

Figure 5 .
Figure 5.The acquisition years and regions of the input point cloud.

Figure 7 .
Figure 7. Buildings built after the acquisition date of the point cloud, nearby Rotterdam.The orthogonal distances from the point cloud to the model (point-to-model) is aggregated as the root mean squared distance (RMS) for each model for the buildings in the test area.The RMS values are analysed per LoD.Figure8shows the distribution of the RMS for each LoD, where additional statistics are listed in Table1.In LoD 2.2, the median RMS is as low as 0.04 m (mean 0.1 m), which indicates a relatively close fit between the models and the point cloud.

Figure 8 .
Figure 8. Root Mean Squared distances from points to model, per Level of Detail.Values in meters.
Figure 9 illustrates such a case, where the model has a low RMS of point-to-model distance, 0.04 m, while the maximum model-to-point distance is 6.1 m.

Figure 9 .
Figure 9.An example of a model with a "screen".This model has an RMS of 0.04 m, and a maximum model-to-point distance of 6.1 m.

Figure 10 .
Figure 10.Examples for the three roof types in the data set.From left to right: slanted, multiple horizontal, single horizontal.
system of the LiDAR data.A third LS3D run is applied, to show the positional accuracy of individual buildings and the completeness.Oude Elberink and Vosselman (2011) presents a method to evaluate the quality of the reconstructed building models from ALS data.Although their method is specific to the reconstruction method presented by Oude Elberink and

Table 1 .
The median, mean and standard deviation of the RMS per LoD.Values in meters.