MODEL-BASED BUILDING DETECTION FROM LOW-COST OPTICAL SENSORS ONBOARD UNMANNED AERIAL VEHICLES

The automated and cost-effective building detection in ultra high spatial resolution is of major importance for various engineering and smart city applications. To this end, in this paper, a model-based building detection technique has been developed able to extract and reconstruct buildings from UAV aerial imagery and low-cost imaging sensors. In particular, the developed approach through advanced structure from motion, bundle adjustment and dense image matching computes a DSM and a true orthomosaic from the numerous GoPro images which are characterised by important geometric distortions and fish-eye effect. An unsupervised multi-region, graphcut segmentation and a rule-based classification is responsible for delivering the initial multi-class classification map. The DTM is then calculated based on inpaininting and mathematical morphology process. A data fusion process between the detected building from the DSM/DTM and the classification map feeds a grammar-based building reconstruction and scene building are extracted and reconstructed. Preliminary experimental results appear quite promising with the quantitative evaluation indicating detection rates at object level of 88% regarding the correctness and above 75% regarding the detection completeness.


INTRODUCTION
Three dimensional building and landscape models are of great interest for various engineering applications such as urban and rural planning, updating geographic information system, smart cities, 3D visualization, virtual tourism, location based services, navigation, wireless telecommunications, disaster management, noise, heat and exhaust spreading simulations.All these applications are actively discussed in the geography, geoscience and computer vision scientific communities both in academia and industry.Organizations like Google and Microsoft are seeking to include extensively up-to-date 2D and 3D urban models in their products (Microsoft Virtual Earth and Google Earth).Such tools, which have gained important public acceptance, have already deployed 3D building reconstruction techniques as a major visualization component in their process.
However, the prohibitively high cost of generating manually such 2D and 3D dynamic models/maps explains the urgent need towards automatic approaches, especially when one considers modeling and monitoring time-varying events within the complex urban areas at ultra high spatial resolution [Haala and Kada, 2010], [Karantzalos, 2015].Recent quantitative results from the ISPRS (WGIII/4) benchmark on urban object detection and 3D building reconstruction [Rottensteiner et al., 2014] indicated that is room for improvement towards the detection of small building structures and the precise delineation of building boundaries.Depending on the type and resolution of the remote sensing data a lot of different approaches have been proposed in the literature both pixel-and object-based ones [Karantzalos and Paragios, 2009, Karantzalos and Paragios, 2010, Ferro et al., 2010, Jabari et al., 2014], while several building detection approaches from optical data are based on morphological filters [Lefevre et al., 2007], texture [Volpi et al., 2013], point descriptors [Wang et al., 2013], gradient orientation [Benedek et al., 2012], deep learning features [Vakalopoulou et al., 2015], etc.Moreover, elevation data, like airbone or terrestrial LIDAR, can significantly ameliorate the detection procedure [Sun and Salvaggio, 2013].However, they can not yet be considered as a costeffective solution especially for large scale mapping and change detection applications which require frequent data acquisition campaigns [Karantzalos, 2015].As unmanned aerial vehicles (UAVs) are becoming more and more reliable, with more autonomous capabilities while at the same time their cost is also decreasing, the integration of low-cost UAV and imaging systems is becoming gradually efficient [Kim et al., 2013], [Roca et al., 2013].Moreover, such autonomous acquisition systems, both fixed-wing or drones, can deliver quickly, huge datasets in ultra-high spatial resolution.
To this end, in this paper we have exploited aerial datasets from low-cost UAV and imaging systems towards building detection in 2D and 3D.In particular, the developed approach through hierarchical structure from motion, bundle adjustment and dense image matching generates a DSM and a true orthomosaic from the numerous GoPro images which are characterised by significant geometric distortions due to the sh-eye effect.An unsupervised multi-region segmentation and a rule-based classification is then separating and labelling multiple object classes in the image domain.Scene's DTM is calculated based on an inpaininting and mathematical morphology process.A data fusion process between the detected building from the DSM/DTM and the classification map feeds a model-based process where scene buildings  are extracted and reconstructed.Preliminary experimental results from an aerial dataset covering a 0.8Km 2 region in the Eastern Perfecture of Attica, Greece appear quire promising (Figure 1).The algorithm managed to process more than 300 images and detect with high detection rates more than 215 buildings with a detection completeness above 75% and a detection correctness of 88%.

METHODOLOGY
The main motivation was to design a building extraction and 3D modelling approach able to exploit aerial datasets from low-cost, lightweight, RGB cameras onboard unmanned aerial vehicles.A flowchart of the developed approach is shown in Figure 2.
The entire process has been highly automated and once the UAV complete the flight(s), the acquired image data are processed towards the generation of an orthomosaic and a 3D surface model.In particular, all GoPro views are initially rectified from severe fish-eye distortion effect using a rough approximation of their radial distortion model, provided by the manufacturer.All images are then relatively oriented through a hierarchical structure from motion scheme [Pollefeys et al., 2004], [Hartley and Zisserman, (a) The resulted orthophotomosaic (approximately 13.500x7.000pixels) with a spatial resolution of 8cm.2004], [Snavely et al., 2008] at different image scales in order to handle effectively a large number of high resolution images.We start at a low resolution where 2D SIFT [Lowe, 2004] and SURF [Bay et al., 2008] features with descriptors are extracted on all images.Image pairs are identied among the unordered set of images, sparse matching is then performed and outliers are detected and eliminated through RANSAC on the fundamental matrix computation.Combining the inlier matches across different steropairs, multi-image point correspondences are obtained.A bucketing algorithm is also applied to reduce the number of tie points without affecting their distribution.Then, by means of closed-form algorithms and successive, local bundle adjustment solutions initial image orientations are estimated.These orientations are further refined through a typical self-calibrating bundle adjustment solution.This procedure is repeated at higher image resolutions but this time the matching is restricted by the estimated epipolar geometry and a rough 3D reconstruction from the tie points of the previous resolution.The final interior and exterior orientation of all images is calculated again a bundle adjustment solution.At this point, any remaining lens distortion of the initially rectified Go-Pro images is correctly modelled and optimally estimated.
Once all GoPro images are oriented a DSM is reconstructed.For this step several state-of-the-art dense stereo and multi-image reconstruction algorithms have been tested [Hirschmuller, 2008], [Furukawa and Ponce, 2010], [Jancosek and Pajdla, 2011], [Stentoumis et al., 2014] which, either combine different disparity maps into a single model, or compute directly an optimal 3D surface from all images.The DSM presented here has been generated using the CMPMVS algorithm9.Using this DSM a true orthomosaic was computed, by employing a multi-image algorithm based on automatic visibility checking and weighted texture blending [Karras et al., 2007].
An unsupervised segmentation approach is taking place dividing the image in approximately 25 spectral classes.We have employed a multiregion graph cut image segmentation in a kernelinduced space [Salah et al., 2011] on the simplified, through anisotropic morphological filtering [Karantzalos et al., 2007], orthomosaic.Based on a supervised rule-based classification image objects are labelled and merged to terrain classes like buildings, roads, vegetation, etc.
Based on the detected terrain classes those belonging to vegetation and buildings are fused with the DSM towards estimating a DTM.In particular, building and vegetation regions were regarded as missing information and an image inpainting technique was responsible to predict -based on the surrounding surfacesthe actual ground height.A morphological reconstruction process further slightly smooth the estimated DTM.The difference between the DSM and the estimated DTM provide a rough but quite accurate approximation of the building heights.This approach has been validated under different terrain, vegetation and building densities.
During the next processing step the detected building footprints from the classification procedure were fused with the detected building footprints from the DSM/DTM.Then a model-based procedure is responsible to fit the appropriate geometry into the detected binary 2D footprints and 3D DSM.The procedure is heavily based on a prior-based approach [Karantzalos andParagios, 2009, Karantzalos andParagios, 2010] and integrates a simplified version of a grammar-based modelling [Koutsourakis et al., 2009], [Teboul et al., 2013].In particular, a vocabulary with a broad set of footprint models can be considered, while the observed 2D building is a projective transformation of the footprint.Given, the variation of the expressiveness of the grammar, and the degrees of freedom of the transformation, we then focus on the 3D aspect of the model.Similar to [Karantzalos and Paragios, 2010] only building's main height hm and building's roof height hr(x, y) at every point need to be recovered.However, in this approach, instead of calculating the four angles of the four inclined planes, the gradient along the plane was computed.
The study area covers a 0.8Km 2 region in the Eastern Perfecture of Attica, Greece Figure 1.Two autonomous flights were performed and more than 300 RGB images with important geometric distortions were collected.The study area along with the UAV flight path and plan are shown in Figure 1a, while different views from the low-cost GoPro camera are shown in Figure 1b with important geometric distortions due to the fish-eye effect.
The developed approach through hierarchical structure from motion, bundle adjustment and dense image matching managed to successfully compute a DSM and a true orthophotomosaic from the numerous GoPro images.The resulted orthophotomosaic (approximately 13.500x7.000pixels) with a spatial resolution of 8cm and the resulted DSM with a spatial resolution of 10cm are shown in Figure 3. and False Negatives (FN) were calculated for every case.In particular, TP represent the buildings that have been identified correctly, the FN represent the buildings that have not been detected and the FP correspond to false alarms i.e., objects that were detected but are not actually buildings.
The performed quantitative evaluation indicated high detection correctness rates above 88%.The detection completeness was relative lower but above 75%.The developed algorithm managed to detect and reconstruct the geometry of more than 215 buildings.The vast majority of the false negatives belonged to buildings with a relative small size and this particular fact was in accordance with the literature and similar efforts [Rottensteiner et al., 2014].Moreover, it should be noted that the evaluation of the rooftop reconstruction component was performed with a qualitatively manner since 3D reference data at ultra high resolution were not available.

CONCLUSIONS AND FUTURE PERSPECTIVES
In this paper, building detection in 2D and 3D was addressed through the exploitation of aerial datasets from low-cost UAV and imaging systems.In particular, the developed algorithm integrates advanced structure from motion, bundle adjustment and dense image matching procedures towards the computations of a DSM and a true orthophotomosaic from numerous RGB images which important geometric distortions.Terrain classes are detected through an unsupervised multi-region segmentation and rule-based classification process.Scene's DTM is calculated based on an inpaininting and mathematical morphology procedure.A data fusion process between the detected building from the DSM/ DTM and the classification map feeds a model-based process where scene buildings are extracted and reconstructed.Preliminary experimental results from an aerial dataset covering a 0.8 Km 2 region in the Eastern Perfecture of Attica, Greece appear quire promising.The algorithm managed to process more than 300 images and detect with high detection rates more than 215 buildings with a detection completeness above 75% and a detection correctness of 88%.The reliable estimation of building 3D geometry suggested that the proposed method constitutes a quite promising tool for various object extraction and reconstruction application with data from low-cost imaging systems onboard UAVs.A sensitivity analysis and code optimisation is currently under development.

ACKNOWLEDGEMENT
This research has been co-financed by the European Union (European Social Fund-ESF) and Greek national funds through the Operational Program 'Education and Lifelong Learning' of the National Strategic Reference Framework (NSRF)-Research Funding Program THALES: Reinforcement of the interdisciplinary and/or inter-institutional research and innovation.
Figure1: The study area covers a 0.8Km 2 region in the Eastern Perfecture of Attica, Greece.Two autonomous flights were performed and more than 300 RGB images with important geometric distortions were collected.

Figure 2 :
Figure 2: The flowchart of the developed model-based building extraction and reconstruction approach.
(b) The resulted DSM with a spatial resolution of 10cm.

Figure 3 :
Figure3: The developed approach through advanced structure from motion, bundle adjustment and dense image matching managed to successfully compute a DSM and a true orthophotomosaic from numerous GoPro images.
Figure 4: Based on an unsupervised segmentation, rule-based classification and data fusion process the developed algorithm managed to extract building footprints.A model-based approach further recovers their correct 2D and 3D geometry.
During the next processing steps, the unsupervised segmentation, the rule-based classification and the data fusion process managed to detect scene buildings and separate them from other terrain objects (like man-made and trees) with similar spectral of height properties.Experimental results with the labelled spectral classes after the multi-region, graph-cut segmentation are shown in Figure 4a.The detected 2D building footprints after the fusion process are shown in Figure 4b and superimposed to the true orin Figure 4c.The final step of the developed algorithm managed to recover the geometry of both the 2D footprints and 3D rooftops as shown in Figure 5.In particular, after the application of the model-based procedure the corrected building 2D footprints are shown in Figure 5a in a binary format and superimposed onto the orthophotomosaic in Figure 5b.The recovered geometry of the detected buildings is shown in Figure 5c.The validation of the developed building detection framework has been performed, also, through the quantitative comparison of the detection results with the reference (ground truth) data.The standard measures of Completeness and Correctness have been employed and calculated at object level.To this end, based on the ground truth data, the True Positives (TP), False Positives (FP)

Figure 5 :
Figure 5: The detected buildings in 2D and 3D after the application of the developed approach.