OBLIQUE PHOTOGRAMMETRY SUPPORTING 3D URBAN RECONSTRUCTION OF COMPLEX SCENARIOS

Accurate 3D city models represent an important source of geospatial information to support various “smart city” applications, such as space management, energy assessment, 3D cartography, noise and pollution mapping as well as disaster management. Even though remarkable progress has been made in recent years, there are still many open issues, especially when it comes to the 3D modelling of complex urban scenarios like historical and densely-built city centres featuring narrow streets and non-conventional building shapes. Most approaches introduce strong building priors/constraints on symmetry and roof typology that penalize urban environments having high variations of roof shapes. Furthermore, although oblique photogrammetry is rapidly maturing, the use of slanted views for façade reconstruction is not completely included in the reconstruction pipeline of state-of-the-art software. This paper aims to investigate state-of-the-art methods for 3D building modelling in complex urban scenarios with the support of oblique airborne images. A reconstruction approach based on roof primitives fitting is tested. Oblique imagery is then exploited to support the manual editing of the generated building models. At the same time, mobile mapping data are collected at cm resolution and then integrated with the aerial ones. All approaches are tested on the historical city centre of Bergamo (Italy).


INTRODUCTION
Accurate 3D city models are a core element of urban mapping and represent an important source of information to support various "smart city" applications, such as space management, energy assessment, 3D cadastre, noise and pollution propagation or disaster management (Biljecki et al., 2015).In the last decades 3D city models have been predominantly used for visualization purposes, after their interactive visualization at large and medium scales was first opened to the general public by applications like Google Earth and Bing Maps (Leberl et al., 2009).Today the potential of 3D data has become valuable for several purposes beyond simple visualization and its economic value is evident, as e.g.demonstrated by the EuroSDR project "Identifying the Economic Value of 3D Geoinformation" (EuroSDR, 2017).Focusing on the modelling of urban manmade environment, many efforts have been done by various scientific communities (i.e.photogrammetry and remote sensing, geo-informatics, computer vision and computer graphics) in order to get accurate and timely updated 3D building models.Given the necessity to cover large areas, passive or active airborne data still represent the most common source of information for 3D geometry capture in urban scenarios.In fact 3D building and city models are commonly derived either from LiDAR-generated point clouds (Haala and Brenner, 1997;Kada and McKinley, 2009;Tomljenovic et al., 2015) or from airborne imagery and image matching techniques (Suveg and Vosselman, 2004;Kluckner and Bischof, 2010;Bulatov et al., 2014).A combination thereof is also possible (Habib et al., 2011;Sohn et al., 2013;Zhang et al., 2014).If until few years ago the base information was mainly provided by airborne LiDAR data, recent developments of image-based methods encourage the use of image data.Theoretically, optical images with their high spatial resolution allow for the extraction of 3D points at a remarkable geometric resolution, accuracy and reliability.Furthermore, the quickly growing sector of oblique airborne cameras (Remondino and Gerke, 2015;Remondino et al., 2016), is demonstrating its potential for detail reconstruction on building facades and footprint extraction (Haala and Rothermel, 2015).Therefore, data and technologies are available and detailed building models close to reality seem to be already within grasp.However, despite the intensive efforts towards an improved 3D modelling process, there are still many unsolved problems (Lancelle and Fellner, 2010;Rottensteiner et al., 2014).Firstly, most approaches introduce strong building constraints on symmetry and roof typology.Therefore, although reducing the problem complexity, they lack from generality and become inefficient when applied for the reconstruction of complex urban scenarios, like historical city centres.Secondly, the automated reconstruction of highly accurate building models is still a challenging task: a significant manual editing is usually required especially when it comes to the modelling of unusual and varying building roofs.Thirdly, although oblique photogrammetry is rapidly maturing, the use of slanted views for façade reconstruction is still limited by image resolution and occlusion issues (Haala and Rothermel, 2015), and thus not completely included in the reconstruction pipeline of state-ofthe-art software.

Paper objectives and contributions
The paper investigates the use of airborne oblique imagery for the 3D building reconstruction in complex urban scenarios.LOD1 and LOD2-compliant models are generated by adopting a reconstruction approach provided by a state-of-the-art commercial software and based on roof primitives fitting (Section 3).The photogrammetrically derived 2.5D DSM cloud is used as input to the roof modelling, then oblique imagery is exploited to support the manual editing of the generated building models.Finally, the use of a Mobile Mapping System (MMS) is investigated in order to collect terrestrial data and complement those captured by airborne platform (Section 4).All approaches are tested on the city of Bergamo (Italy) that represents a typical example of historical city centre with varying and complex building shapes.After a brief description of the test area and input data (Section 2), the paper focuses on the methodologies investigated (Section 3 and 4) and results achieved (Section 5).Given the current lack of external ground truth, accuracy assessment is beyond the scope of the present work and will be performed in future.

Related work on 3D building modelling from airborne data
Several methods have been developed for building modelling purposes (Haala and Kada, 2010;Lafarge and Mallet, 2012;Musialski et al., 2013 -Figure 1).Beside manual approaches, automated or semi-automated approaches can be classified into three main groups, i.e. (i) reconstruction with parametric shapes, (ii) reconstruction based on segmentation and polyhedral modelling, and (iii) reconstruction based on DSM or mesh simplification.The first group of methods builds upon the concept that roofs can be approximated with parameterized common shapes.Therefore the modelling process is tackled by searching for the selected types of pre-defined roof types that best fit the input data.Commonly adopted roof shapes are those e.g.described in (Milde and Brenner, 2009;Kada 2009).Although not mandatory, these methods generally first perform an interactive segmentation of the point cloud on which are then fitted the shape primitives.Building footprints, when available, can be adopted to support the process, after a non-trivial step of splitting and/or merging (Kada and McKinley, 2009;Vallet et al., 2009).Finally, after 3D primitives have been reconstructed for all elements derived from segments and/or footprints, they are merged together in order to deliver a unique building shape.The second group of approaches is based on a segmentation of the point cloud that results in a partition of the points where all points in one segment belong to the same geometric shape.Many segmentation methods have been developed to extract various geometric shapes (Vosselman et al., 2004;Sampath and Shan, 2010), however almost only segmented planar regions are adopted by reconstruction algorithms.After segmentation, most approaches perform a neighbourhood analysis of the segments, in order to extract roof features like intersection lines, step edges or sub-shapes of the roof.This generally builds upon the description of topological relations between the segments (Verma et al., 2006).Finally, once the segments and the roof features have been extracted, the building models are generated through polyhedral construction.Generally, intersection lines and step edges are extended, so that they connect at corners and form a closed polygon.Then vertical faces are embedded at step edges and building outer boundary (Dorninger and Pfeifer, 2008).The third group of methods leans on the concept that buildings are "contained" in a detailed 3D mesh or 2.5D DSM and seeks to simplify the meshed data until it meets pre-defined geometric and semantic criteria.Different mesh simplification approaches have been introduced, e.g. based on dual contouring (Zhou and Neumann, 2010).Finally, if single building models are needed, the corresponding regions of the mesh can be extracted and adjusted to have closed boundary 3D models.With regards to the level of detail (LOD) obtained in the final representation, these approaches are usually adopted to generate building models that feature flat facades and distinctive roof structures.These representations are consistent with LOD2 specifications, according to the OCG standard CityGML (Gröger and Plümer, 2012) and are sufficient for many applications (e.g.visualizations and simulations).In contrast, a number of "localization-aware" applications (e.g.navigation and urban planning) require detailed information for the building facades.In this case, LOD3-compliant building models should be provided, where also elements like doors and windows are modelled.Such features can hardly be captured from nadir-only airborne images due to visibility issues.The use of oblique multi-head camera systems can potentially tackle the problem and provide for an efficient means of information extraction in urban scenarios.Many research works already proved the potential of oblique photogrammetry in dealing with information extraction in such contexts (Haala and Rothermel, 2015;Frommholz et al., 2015;Moe et al., 2016).However, although façade reconstruction is feasible through recently improved multi-view image matching and meshing techniques, the reconstruction of highly detailed and accurate 3D objects in complex and historical urban environments still presumes additional data captured from terrestrial viewpoints.In this regard, the use of mobile mapping systems (MMS) and derived point clouds, efficiently integrated with airborne data, can represent an efficient way to increase the amount of details reconstructed in complex urban scenes.

Test area
The project area is located in Bergamo, a city of about 120,000 inhabitants in northern Italy.The town features two distinctive centres, a historical one (called "Città Alta", ca 1.2 x 0.8 km -Figure 2) and a modern one (called "Città Bassa").The former represents the test site where all methodologies are investigated.It is the oldest part of the city and lies on the top of a hill.Surrounded by 16 th -century Venetian walls, it presents old medieval buildings with complex and varying shapes, located on narrow streets in a densely built urban area.In terms of 3D geometry mapping and modelling, this scenario poses significant challenges, due to (i) visibility constraints, that limits data acquisition from airborne platform and (ii) historical buildings with complex roof shapes that complicate the use of standardized primitive shapes.

Airborne data and derived point clouds
The airborne dataset comprises 1073 nadir and 4292 oblique images.They were acquired by AVT/Terra Messflug with a Vexcel UltraCam Osprey Prime multi-camera system (a large frame nadir camera and four oblique looking cameras along the four cardinal directions at an inclination of 45°).The block consists of 22 stripes, covers an area of ca.5.3 km x 5.6 km and was executed on 6 th July 2016 with favourable weather conditions.The flight plan was designed using an average nadir GSD of 12 cm, and along/across overlaps of 80% and 60%, respectively.Ground truth data were provided in the form of 13 ground control points (GCPs) and 18 independent check points (CPs) surveyed with RTK GNSS with a mean 3D accuracy of 5 cm.Further details are listed in Table 1.The Aerial Triangulation (AT) is performed following the strategy presented in (Moe et al., 2016).The AT results of the nadir images are extended to include, by means of an in-house developed tool, the four additional oblique views with their calibrated relative orientation parameters (offsets, angles).Then, the initial values of exterior orientation, interior orientation (focal length, principal point) and additional parameters of all sensors are adjusted within a self-calibrating bundle adjustment, performed with a commercial state-of-theart software, Pix4D (www.pix4d.com).The adjustment accuracy, measured as root mean square errors (RMSEs) on 18 CPs, is about half GSD, i.e. 4.5 cm (X), 5.8 cm (Y) and 4.9 cm (Z).With regards to the number of matches between camera views, similar considerations to the ones presented in (Gerke et al. 2016;Moe et al., 2016) can be stressed.More than half of the extracted tie points (56%) is visible in only two images and nadir-to-oblique pairs get matched significantly less compared to views pointing to the same direction.This issue may affect AT's reliability in such scenarios and requires to address the non-solved problem of occlusions and ambiguities due to convergent views.Starting from the AT results, the multi-view matching pipeline provided by SURE (www.nframes.com) is adopted to extract (Figure 3): (i) a filtered 3D dense point cloud over Bergamo "Città Alta", featuring a mean spatial resolution of 1 GSD (60 mil points); (ii) a 2.5D DSM cloud over the entire city, sampled to a XYgrid size of 1 GSD (more than 300 mil points).Figure 4 shows a close view of the dense point cloud of Bergamo "Città Alta", with building façades nicely reconstructed thanks to the slanted views.Finally, a true orthophoto is generated at 1 GSD resolution.

Multi-camera platform Sensor size [mm] Focal length [mm] Average GSD [m] # images Along/across overlap Area [km]
Microsoft UltraCam Osprey Prime Table 1.Camera and dataset specifications.N and O stand for nadir and oblique.The given overlap is calculated on the nadir images.

MMS data and derived point clouds
Terrestrial Mobile Mapping System (MMS) data were acquired by SINECO (http://www.gruppo-sina.it/main.asp?soc=sineco) with their Laser Mobile Mapper.The platform integrates two synchronously operated RIEGL VMX 450 laser scanners, a portable control unit and an Applanix POS LV420 navigation system.The system has a point accuracy of ± 7 mm at 100 metre distance and it was set to acquire 400 lines/sec.The platform was also equipped with a digital spherical camera (Point Grey Ladybug5) to capture 30 Mpx images, covering up to 90% of a full sphere.All sensors, whose configuration is shown in Figure 5 (left) were mounted on a car.The data acquisition lasted ca 3 hours to cover some 9 km of roads in the historic city part (Figure 5, right).Data postprocessing was performed by SINECO, adopting the same control information (i.e.GNSS-surveyed points) that was included in the AT of the airborne data.This provides for multiplatform data consistency in terms of reference system.The MMS-derived point cloud contains some 5 billion points (Figure 6), corresponding to a mean spatial resolution of 1-2 cm on the road plan and on the facades adjacent to the roads.

BUILDING MODELLING WITH PARAMETRIC SHAPES
The tridicon/Hexagon suite of tools (www.tridicon.de) is adopted to test a reconstruction method based on parametric shapes fitting.The suite includes several modules that allow for point cloud generation from oriented images, 3D building models reconstruction, 2D mapping and 3D modelling based on aerial images, editing, roofs and façades texturing as well as interactive navigation.For the Bergamo dataset, the modelling task is accomplished by using the "CityModeller" tool that requires building footprints as mandatory input data.Since the tested version of the software is not able to use points along facades, the 2.5D DSM cloud is selected as elevation data.
Further input is provided in form of terrain data, i.e. an available LiDAR DTM (1 m spatial resolution) internally used by the software to assign ground heights to buildings.With regards to the modelling procedure, at first building footprints are split and simplified according to geometric criteria.This is not a mandatory step, however it can help in dealing with the complexity of building footprints and the limited geometric precision of cadastral maps.Secondly, points that are inside each decomposed footprint cell are compared to a library of template roof shapes (e.g.gable roof, hip roof, lean-to roof, pyramid roof, tent roof, dome roof, etc.).A cell is associated to a shape according to the best fitting of most points and based on a user-defined threshold.Thirdly, a building adjustment is carried out in order to adjust heights of neighbouring buildings and improve the connection between them.Once the modelling is complete, building models can be edited in the "3D Editor" tool.The software allows to import oriented images, superimpose the generated building polygons over them and edit the geometry according to the 2D reference.Since also oblique oriented images can be used as reference, the complete Bergamo dataset (nadir and oblique imagery) is here adopted to support the manual editing of the generated building models.This allowed for more convenient viewing directions to be exploited in checking and adjusting complex roof shapes and building outlines.

DATA INTEGRATION FOR FURTHER ANALYSES AND VISUALIZATION
Various authors have presented the use of terrestrial MMS to provide extremely dense 3D point coverage of building facades (Barber et al., 2008;Haala and Rothermel, 2015;Toschi et al., 2015).With regards to the test conducted in this paper, the integration is mainly performed with these aims:  to integrate the point cloud derived from airborne imagery in order to provide for a complete 3D reconstruction of the urban scene.Although oblique images potentially allow for capturing geometries also on building facades and footprints, this is practically limited by the significant occlusions occurring in densely-built urban scenarios.


to gather all pieces of information that are required for a survey at urban scale and need high spatial resolution to be represented and measured (e.g.aerial lines, road sign, sidewalks and cobblestones).The MMS point clouds, in fact, allow to clearly detect all architectural details on facades (at scale 1:100-1:200), with a completeness, accuracy and density comparable in some cases to the ones achievable with a classic static scanning survey.In order to visualize, navigate and analyse the aerial and terrestrial point clouds, data are loaded in the BIM3DSG system (Fassi et al., 2015).BIM3DSG was originally designed to be a BIM system for Cultural Heritage applications.At the current state, it provides a web solution able to manage huge, complex, dense and heterogeneous 3D data (both point clouds and polygons), thus offering an efficient means for data sharing (Rechichi et al., 2016).Data are loaded at full resolution and a semi-automatic procedure creates several levels of detail that the user can change depending on the adopted device.The webbased system works inside browsers (recommended is Firefox Nightly) but even mobile devices could be used by simply reducing the spatial resolution of the data.To speed-up the navigation, the aerial and MMS point clouds are divided into sub-areas according to well-identified regions of the town.In this way, it is possible to visualize all the acquired data either individually or together.

LOD2 building models
Figure 7 shows two views of the LOD2-compliant building models generated in "Città Alta".The use of standardized parametric shapes for approximating building roofs shows good performance in dealing with most common roof shapes and, if details like chimneys or dormers are not present, the developed automatic approach is sufficient for modelling non-complex areas.However, when it comes to the reconstruction of the old city centre, "CityModeller" fails in automatically defining the correct shape of some complex roofs (Figure 8).An example is also shown in Figure 9, where building colours refer to the "quality" of the extracted roof shape (i.e.green, yellow and pink indicate high-quality, medium-quality or not-recognized roof shape, respectively).
Figure 9. Example of building roofs not correctly modelled by "CityModeller" due to unusual and complex roof shapes.
Although the user can define its own primitives, we found it more convenient to manually edit the "partially" reconstructed shapes by using both nadir and oblique images as references ("3D Editor" tool).If the adoption of nadir imagery is usually enough when editing buildings in not-dense areas, the additional support given by oblique imagery becomes essential when dealing with complex historical areas.The densely-built city centre features "blocks" of buildings with multiple occlusions that can hardly be handled from a nadir point of view.Furthermore, the slanted view provided by oblique images allows for an improved intersection geometry of the 3D rays and, consequently, for a more accurate definition of the height.

Visualization of multi-sensors point clouds
Figure 10 shows the overall 3D point cloud generated integrating aerial and MMS data and visualized in BIM3DSG.
The completeness of the urban 3D scene and the impressive amount of details reconstructed on building facades and at road level are evident.Although this is still a point-based 3D representation, modelling and meshing tools can be applied in order to retrieve surface models.Furthermore, beside the merged point clouds, the LOD2-compliant building models generated with tridicon/Hexagon tools are included in the platform as well.Each geometric object can be associated to ancillary information, images and hotspots, thus connecting the reconstructed geometry of the town with different informative systems (e.g.Toschi et al., 2017).
Finally, a visualization of the massive integrated 3D point cloud data (more than 5 bil points) is also provided through the webbased viewer Potree (www.potree.org)and showed in Figure 11.

CONCLUSIONS
Despite considerable efforts in the scientific community, the 3D reconstruction of complex urban scenarios, like historical city centres, is still a challenging task.This work investigated the use of dense point clouds from airborne multi-camera imagery and roof primitives fitting methods to derive LOD2-compliant building models.The use of parametrized common roof shapes, as implemented in a state-of-the-art commercial software, provided for efficient modelling results in case of standard roof shapes.However, when dealing with a medieval city centre and its building agglomerates with complex roof shapes, a significant manual editing of the results is required and the use of oblique imagery is essential in supporting this task.The density and quality of the point cloud is only partly playing a role for the success of these types of approaches.
A fully automated approach is under development to automatize the generation of LOD2 building models from 3D point clouds generated by multi-view dense image matching: it takes as input the dense point clouds, it exploits both roof and facades segments to extract geometric features and construct the building models based on CGAL libraries (www.cgal.org).This eliminates the need for (i) strong building priors on verticality, symmetry and orthogonality and (ii) common pre-defined roof primitives.However, in its current state, the algorithm depends on a number of subsequent and mutually dependent geometric heuristics that need to be tuned case by case.Finally, the use of MMS data to complement dense point clouds derived from airborne platforms was proved to be a promising solution to retrieve a dense and complete 3D reconstruction of a complex urban environment.The integration of these two types of data is mandatory to completely model historic city centres and produce LOD3 building models although the management of such massive datasets is still problematic.

Figure 1 .
Figure 1.General methods for 3D building reconstruction from airborne data and specific approaches investigated in the paper (red text, bottom).

Figure 3 .
Figure 3.The same part of the historical city seen as 2.5D DSM cloud (left) and 3D point cloud (right).

Figure 4 .
Figure 4. Close view of the shaded 3D point cloud (1 GSD resolution) of the historic centre.

Figure 5 .
Figure 5. Configuration of the SINECO MMS (left) and the trajectory of the surveyed area (right).

Figure 6 .
Figure 6.Two views of the point cloud acquired with the MMS in the historic city centre of Bergamo "Città Alta".

Figure 7 .
Figure 7.View of the LOD2-compliant models reconstructed with the tridicon/Hexagon tools: an overview of the 3D scene within Google Earth (left) and a close-up view with colours corresponding to the building IDs provided by cadastral data (right).

Figure 8 .
Figure8.Modelled 3D buildings (left) and real shapes (rights, from Google Earth) in the hearth of the historic centre with its complex roofs.

Figure 10 .
Figure 10.An overall view of the airborne and terrestrial merged point clouds within the BIM3DSG system (left) and close-up view (right).