AUTOMATIC 3D BUILDING MODEL GENERATION USING DEEP LEARNING METHODS BASED ON CITYJSON AND 2D FLOOR PLANS

: In the past decade, a lot of effort is put into applying digital innovations to building life cycles. 3D Models have been proven to be efﬁcient for decision making, scenario simulation and 3D data analysis during this life cycle. Creating such digital representation of a building can be a labour-intensive task, depending on the desired scale and level of detail (LOD). This research aims at creating a new automatic deep learning based method for building model reconstruction. It combines exterior and interior data sources: 1) 3D BAG, 2) archived ﬂoor plan images. To reconstruct 3D building models from the two data sources, an innovative combination of methods is proposed. In order to obtain the information needed from the ﬂoor plan images (walls, openings and labels), deep learning techniques have been used. In addition, post-processing techniques are introduced to transform the data in the required format. In order to fuse the extracted 2D data and the 3D exterior, a data fusion process is introduced. From the literature review, no prior research on automatic integration of CityGML/JSON and ﬂoor plan images has been found. Therefore, this method is a ﬁrst approach to this data integration.


INTRODUCTION
Buildings have a significant role in our daily lives. Therefore, a lot of effort is put into improving them. One way of doing this is by applying digital innovations to a building's whole lifecycle, which consists of the following stages: planning, construction, operation, renovation and demolition (Ngwepe and Aigbavboa, 2015). In the past decade, a lot of effort is put into applying digital innovations to building life cycles (planning, construction, operation, renovation and demolition (Ngwepe and Aigbavboa, 2015)). 3D Models have been proven to be efficient for decision making, scenario simulation and 3D data analysis during this life cycle (Rajat Agarwal and Sridhar, 2016). Creating such digital representation of a building can be a labour-intensive task, depending on the desired scale and level of detail (LOD). This research aims at creating a new automatic deep learning based method for building model reconstruction. It combines exterior and interior data sources: 1) 3D BAG, the first fully automatically generated 3D building data set with level of detail 2.2 1 , 2) archived floor plan images (e.g. scanned or exported from CAD software). From the literature review, no prior research on automatic integration of CityGML/JSON and floor plan images has been found. Therefore, a new method has been developed, combining these two internal and external data sources into full 3D models. Having precise linked 3D data provides opportunity for detailed analysis such as improved area calculation, facility management, urban planning, and energy simulation. 1 3D BAG, 3D BK TU Delft -https://3dbag.nl

BACKGROUND
3D representations of buildings are getting more attention, and are used for a wide range of applications (Biljecki, 2016). 3D building objects do exist in both the BIM and GIS domain, but are used from a different perspective (Herle, 2020). BIM models are by default highly detailed and often represent a physical building with a small error margin. GIS is often used on a larger scale, with a lower granularity, and provides variety of opportunities for geolocation, analysis, simulations and documentation (Laat, 2010). 3D building objects in city models can be obtained automatically, in example by utilizing building footprints and boundary properties from data sources e.g. cadastre databases or aerial images (Amiranti, 2020, Balázs Dukai1, 2020, Sander Oude Elberink and Commandeur, 2013. These automatically generated objects do not contain building internal structure. Research by Boeters (Boeters, 2015) introduced LOD2+, which is an extension to City GML 2, to support floor levels. In this paper, a method is introduced to estimate heights and include floor surfaces to the existing CityGML LOD 2 objects. Current ongoing research by iNOUS 2 aims at reconstructing IndoorGML based on building blueprints and LIDAR data. After that, this data will be combined with existing IFC (BIM) files. This research is not published yet.
CityJSON -CityJSON (Ledoux et al., 2019) is a JSON encoding for 3D city models. The introductory paper claims that, in comparison with CityGML (the current Open Geospatial Consortium standard) it is easier to use, and it is more compact with a compression factor of around six with real-world data (Ledoux et al., 2019).
Floor plan parsing -Previous work from Or et. al. (hang Or, 2005) in 2005 introduced deterministic image processing and symbol recognition techniques to interpret floor plan scans. However, as with many computer vision problems, the focus has shifted from feature engineering and deterministic tasks to methods learning from training data (Ahti Kalervo et al., 2019). In order to obtain the multiple segmentation maps and labels, e.g. room types and point of interest (walls, icons, openings etc.), multi-task methods or networks should be used. The performance of multi-task networks do highly depend on the relative weighting between each task's loss (Kendall et al., 2018). The deep learning breakthrough for floor plan parsing was presented in resarch by Chen Liu (Liu, 2017), that used deep learning to vectorize rasterized images. It does so by using a discriminative network to obtain junctions, integer programming to obtain primitives and finally post-processing to obtain a vector format. In Table 1, other relevant work is shown.
Polygon overlap -To find the appropriate scale, rotation and orientation to project the exterior and floor plan outline, the maximum overlap of the two outlines needs to be found. De Berg (Berg, 2005) defined a method on finding optimal polygon overlap under translations. However, this method only works for convex polygons. Building blueprints do not have to be convex, so this function is not applicable. Milenkovic (Milenkovic, 1998) introduces a method for optimal overlap using rotations and movements, that also works for non-convex polygons. This method does not support other methods such as scaling. Har-Peled (Har-Peled, 2016) introduces a method that approximates the maximum overlap of a polygon under translations. For polygons close to convex, this problem can be solved in nearly linear time. Research by Ahn et. al. (Ahn et al., 2007) calculates the maximum overlap for two polygons using rigid motions (translation, rotation, scale, reflection and glide reflection). This method gives a rigid motion φ app that is at least 1 -α times the maximum over all rigid motions. Berg et. al. (De Berg et al., 1998) introduces a method for maximum overlap of two convex polygons under translations in O(n + m)log(n + m) time, where n and m are the number of vertices in the respective polygons. The method is an algorithm of steps with binary searches for new locations for the points based on the average centroid (geometric centre) of both polygons.

METHOD
To reconstruct 3D building models from the two data sources mentioned above, an innovative combination of methods is proposed (Figure 1). The exterior 3D data is in CityJSON format and the interior 2D data is provided by the municipality of Rijssen-Holten, who are interested in the current research. In addition, road data is downloaded from the national road database 3 . The next subsections elaborate on component I (interior) and II (data fusion).

Floor Plan Parsing
In order to obtain the information needed from the floor plan images, they are parsed into vectors with attached semantic attributes ( Figure 1). The walls, openings and room labels are extracted separately, adopting the deep learning methods proposed Liu (Liu, 2017). In order to train above mentioned deep learning models, the CubiCasa (Ahti Kalervo et al., 2019) dataset is used. This dataset consists of 5000 floor plan images (input) and annotated vector files (ground truth). Walls: The first step is to obtain individual walls, using semantic segmentation, which is the process of labeling each pixel of an image with a corresponding object class (wall or other). For this, the U-Net network and Fast-SCNN are tested. After segmentation, individual walls are obtained by morphological transformations (closing, dilation and erosion) and post-processing. A single polygon is created by combining all contours (Figure 2). From this polygon, wall pieces are obtained by an algorithm that removes wall elements from the all-walls polygon, starting from the left-top. This algorithm also allows for diagonal walls, and is provided in Algorithm 1. Openings: Openings (doors and windows) are identified using object detection. Four deep learning architectures are tested: 1) Faster-RCNN, 2) CenterNet, 3) SSD with MobileNet, and 4) SSD RetinaNet encoder. The object detection algorithms detect bounding boxes around the objects of interests. An example output can be seen in Figure 3.
Room labels: In order to obtain the room types, the labels in the floor plans are interpreted. For this, optical character recognition (OCR) is applied.
There are a variety of off-the-shelve OCR techniques, of which a few are compared in the research done by Tomaschek (Tomaschek, 2018). It was concluded that Tesseract version 4 has the best performance. This consists of a new neural net (LSTM) based OCR engine, which focuses on line recognition.
A computer vision algorithm that recognizes objects surrounded by white space is used as input for the Tesseract neural net.

Merging the 3D BAG and floor plan data
The parsed floor plan is merged with the 3D BAG data. Since the floor plans do not indicate an orientation (e.g. to the North), the roads are used to rotate the 3D BAG building with respect to the floor plan. For each facade with an adjacent road, the maximum polygon overlap (Jaccard index, IoU), is optimized using trust region methods (Conn et al., 2000) (Figure 4). Reallife object sizes are estimated based on the values in Table 2. Inconsistent areas are added to the final CityJSON object if the inconsistency area 5m 2 . The final building, consisting of floor planes, interior and exterior walls, doors, windows, and roof is stored as CityJSON type Building object.   To evaluate the described methods output, 10 random samples  door direction, the ratio between correct directions and true positive doors is given. An object is considered true positive if more than 75% of the ground truth wall area is covered. In Table 3

Merging the 3D BAG and floor plan data
In order to evaluate the proposed method, the method output is compared with a ground-truth BIM-model. The BIM model and its corresponding CityJSON output is shown for each side and storey. The quantitative accuracy is shown using Intersection over Union (IoU, Equation 1). In order to calculate the IoU, polygons are drawn on all objects (such as walls, openings and roofs) on the facade from 4 sides (front, back, left, and right) for both the ground truth and the proposed model output. For each polygon, the IoU is calculated. In this paper, results for two buildings (identified by BAG ID 0163100000538672 and 0164100000294714) are shown in Figure 5, Table 4 and Figure  6, Table 5. The average IoU is 0.7673 and 0.6399

DISCUSSION
The overall performance for floor plan parsing seems better when floor plans are less complicated and less detailed. This is probably due to homogeneous training data and can be improved by including more complicated floor plan images (e.g. office spaces and larger buildings) to the training data set. During training, a Nvidia GeForce GTX 1080 GPU was used. This GPU was launched in 2016 and currently there are more powerful alternatives available that allow for more complex deep learning networks, which could improve performance.
For data integration, the average Intersection over Union is 0.7673. No similar benchmark is available in other scientific literature, but for such challenging task this result is sufficient.

CONCLUSION
The aim of this research was to create a methodology for the reconstruction of 3D building models using the 3D BAG CityJSON and floor plan images. In this paper, a method is proposed to reconstruct 3D building models using deep learning methods based on CityJSON and 2D floor plans. The method can be seen in Figure 1. The performance of this method is measured in Intersection over Union (IoU) and seems sufficient for such challenging task. A way of improving the method is by using more representative training data.
An interesting future direction to obtain walls and openings is to use generative design or parametric design to estimate a buildings internal structure. In example, (Abrishami et al., 2014) proposed a system for generative design for BIM models. By observing floor plans, building 3D exterior, and meta data such as building year, type (terraced, apartment, semi-detached, detached), the most likely internal structure may be estimated by a machine learning model. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W4-2021 16th 3D