INTEGRATING SMARTPHONE IMAGES AND AIRBORNE LIDAR DATA FOR COMPLETE URBAN BUILDING MODELLING

A complete building model reconstruction needs data collected from both air and ground. The former often has sparse coverage on building façades, while the latter usually is unable to observe the building rooftops. Attempting to solve the missing data issues in building reconstruction from single data source, we describe an approach for complete building reconstruction that integrates airborne LiDAR data and ground smartphone imagery. First, by taking advantages of GPS and digital compass information embedded in the image metadata of smartphones, we are able to find airborne LiDAR point clouds for the corresponding buildings in the images. In the next step, Structure-from-Motion and dense multi-view stereo algorithms are applied to generate building point cloud from multiple ground images. The third step extracts building outlines respectively from the LiDAR point cloud and the ground image point cloud. An automated correspondence between these two sets of building outlines allows us to achieve a precise registration and combination of the two point clouds, which ultimately results in a complete and full resolution building model. The developed approach overcomes the problem of sparse points on building façades in airborne LiDAR and the deficiency of rooftops in ground images such that the merits of both datasets are utilized.


INTRODUCTION
Building reconstruction is a popular topic not only actively discussed in the research community (Brenner, 2005;Tack et al., 2012;Vosselman, 1999;Vosselman and Dijkman, 2001) but also in industry (Terrasolid,Smart3D Capture).As a technology capable of collecting 3D dense and accurate point clouds, airborne LiDAR has been widely used for building reconstruction in recent years.Although building rooftops extracted from airborne LiDAR point clouds are meeting the needs of more urban applications (Chen et al., 2012;Galvanin and Poz, 2012;Mongus et al., 2014), façade reconstruction remains to be a problem due to its sparse coverage or even complete occlusion.In fact, building façades are important and informative components for a complete building model, especially in virtual reality and navigation applications.
Recently, smartphones with built-in high quality cameras and other sensors, e.g., GPS and digital compass have become ever prevalent.Such images along with their sensor information can be acquired from image-based social media websites, such as Flickr and Instagram.These photo collections downloadable from the Internet are abundant, being cost free, up-to-date, and high resolution (Avrithis et al., 2010;Goesele et al., 2007;Snavely et al., 2010).State-of-the-art Structure-from-Motion (SfM) and Multi-View Stereo (MVS) reconstruction techniques (Agarwal et al., 2011;Crandall et al., 2013;Fonstad et al., 2013) allow us to process these images to reconstruct building façades in fine detail and precision.In contrast to airborne LiDAR point clouds, the building rooftops produced from ground images are coarse or even incomplete.As a result, the two sources are complementary to each other and their successful integration can lead to a complete building reconstruction.This paper presents a solution to complete building reconstruction of both rooftops and façades by combining airborne LiDAR point clouds and ground image point clouds generated from smartphone images.To start the procedure, images are first clustered by geo-tags obtained from their metadata and then processed by SfM and MVS algorithms.LiDAR point cloud corresponding to the target building captured in the image set is found by using the GPS and compass info in the metadata.The two point clouds, one from airborne LiDAR data and one from ground smartphone images, are registered and combined based on the target building outline points visible to both data sets.A number of tests demonstrate that this approach can achieve a satisfactory reconstruction of building models with finer detail and higher completeness than using one dataset alone.

RELATED WORK
The proposed method consists of following major steps: 1. Generating ground image point clouds using SfM and MVS algorithms; 2. Corresponding building extraction from the airborne LiDAR point clouds based on the ground image point clouds; 3. Geo-registration of the ground image point clouds; 4. Matching building outlines respectively extracted from the LiDAR point clouds and ground image point clouds.A wealth of relevant researches for some of the above individual steps have been conducted.Below is a brief summary.

Structure-from-Motion and Dense Multi-View Stereo
Structure-from-Motion (Snavely et al., 2006) is a range imaging technique that recover camera parameters, pose estimates and sparse 3D scene geometry from 2D image sequences.It falls into two categories: incremental SfM and global SfM.Incremental SfM methods reconstruct target objects by recursively adding images to the point cloud generated by previous images (Agarwal et al., 2011;Goesele et al., 2007;Irschara et al., 2012).Global SfM approaches reconstruct the scene from all input images by estimating a global transform of all cameras so that the time complexity is smaller (Crandall et al., 2012).State of the art method is able to reconstruct cities consisting of 150K images in less than a day on a cluster with fine detail (Agarwal et al., 2011).MVS algorithms generate much denser point cloud of the object from the SfM results.A lot of dense multi-view stereo methods focus on the reconstruction of small objects.In the meantime, it has been proven that dense multi-view stereo methods are more adapted to larger scenes reconstruction, e.g., outdoor buildings with an initial sparse point cloud from SfM (Strecha et al., 2006;Gargallo and Sturm, 2005;Kolmogorov and Zabih, 2002).Furukawa and Ponce (2010) proposed an impressively accurate algorithm that generates dense set of reconstruction patches.

Building Extraction from Airborne Lidar Point Clouds
In classical methods, building extraction from airborne LiDAR data has been resolved by classifying the LiDAR points to different object types such as terrain, buildings, vegetation or others.Rottensteiner and Briese (2002) proposed a method to remove terrain point using a skew error distribution function and then separate building points from other points by analysing height differences.Sampath and Shan (2007) clustered the building cloud into planar segments, followed by their topologic reconstruction.Dorninger and Pfeifer (2008) proposed a comprehensive method for extracting rooftops from LiDAR point clouds.A target based graph matching approach was used to handle both complete and incomplete laser data (Elberink and Vosselman, 2009).Rau (2012) proposed a TIN-Merging and Reshaping roof model reconstruction algorithm.Sohn et al. (2012) generalized noisy polylines comprising a rooftop model by maximizing a shape regularity.

Geo-registration of Ground Image Point Clouds
Registering ground image point clouds to a map was often completed manually when geo-tags and camera pointing information were not available (Snavely et al., 2006).Today, the emergence and prevalent use of mobile cameras enable us to easily acquire images with embedded geo-tags and pointing information.Geo-tags and text labels are commonly used for georegistration of models generated from ground images (Zamir and Shah, 2014;Zamir, 2014;Grzeszczuk et al., 2009), usually followed by an iterative refinement of aligning the models to building footprints.Furthermore, 3D models from Google Earth and building footprints from OpenStreetMap may also be used to improve geo-registration (Untzelmann et al., 2013;Wang et al., 2013).

Point Set Registration
Point set registration, aiming to find a spatial transform that aligns two point sets, is a key component in many computer vision and pattern recognition tasks.Because of its simplicity and low computational complexity, the Iterative Closest Point (ICP) algorithm is one of the most popular methods for rigid point set registration (Besl and Mckay, 1992).It gets globally optimal solution based on the criterion of closest distance and leastsquares.However, the registration result of ICP depends on the initial values of the transform.Various probabilistic methods were developed to overcome this limitation of ICP (Luo and Hancock, 2001;Rangarajan et al., 1996).The Coherent Point Drift (CPD) algorithm (Myronenko and Song, 2010) considers the registration of two point sets as a probability density estimation problem: one point set represents the Gaussian Mixture Model centroids and the other represents the data points.CPD can get accurate results with a decent performance.

METHODOLOGY
The proposed approach for both building rooftops and façades reconstruction consists of the following steps, as shown in Figure 1.First, the smartphone images are clustered by GPS data obtained from image metadata.Every building is reconstructed by SfM and MVS algorithms from a clustered smartphone image set.The corresponding LiDAR point cloud is extracted by using the GPS data and the image pointing information obtained from the clustered image set.Following this stage, two forms of top view 2D outlines of target building are respectively extracted from LiDAR point cloud and ground image point cloud.The two point clouds are then matched by using the building outlines.The target building is finally completely reconstructed in both rooftops and façades through the combined point cloud.This process can be repeated for every building within the study area to create a virtual scene.Figure 1.Workflow for the integration of airborne LiDAR data and ground smartphone imagery.

Building Reconstruction from Smartphone Imagery
Collection of smartphone imagery can be approximately clustered for individual buildings using geo-tags and image pointing information obtained from the Exchangeable Image File (Exif) format data.After that, we only need to handle each clustered image set for generating individual building point cloud in the proposed method.
Images captured by smartphones are lack of camera intrinsic parameters, pose and sensor response characteristics that are usually provided by calibrated cameras.Therefore, the image qualities are relatively poor and the distortions are significant.Geometric and radiometric calibrations of smartphone images are necessary before using them to reconstruct buildings.For smartphone images, camera models and approximate lens data are available from the Exif tags of JPG image files.Adobe Camera Raw, a commercial software tool, is utilized to correct geometric distortion using camera models and lens data (Adobe Camera Raw, 2015).Then, linear stretching is used to minimize colour difference between neighbouring images.This step is necessary to assure both the reliability and precision of building reconstruction.Once much of the geometric and radiometric deformation of images have been calibrated, we proceed to full 3D reconstruction of the clustered smartphone imagery using the SfM algorithm, which is open source and developed by Agarwal et al. (2011).Sparse 3D point clouds and camera matrix are generated after the SfM process.Then a patch-based multi-view stereo algorithm (Furukawa and Ponce, 2010) is utilized to create dense ground image point clouds.

Building Point Cloud Extraction from Airborne LiDAR
The method presented in this paper reconstructs individual buildings one at a time.Once ground image point clouds have been generated, each corresponding airborne LiDAR point cloud of the target building is found and extracted by using GPS and imaging pointing information stored in the Exif tags when the images were taken.Although the precision of GPS readings in smartphones are generally about 10 meters in urban areas, it is sufficient for locating a candidate section of the LiDAR building point cloud.
For each image with a GPS coordinate   (  ,   ) , a circular buffer zone    can be created at   :

2D Building Outlines Extraction
Aligning Point Cloud to Up-right direction.The LiDAR point cloud and the ground image point cloud have common 2D building rooftop outlines which can be used for their registration.
To generate a precise up-right 2D building outlines, ground image point cloud must be aligned to the up-right vector at first.Let    represents the normal of point   in façade ground image point cloud {}.Similar to the method proposed by Untzelmann et al. (2013), we obtain the initial up-right direction  0 by calculating the normal of a plane fitted from camera positions.After removing noise points   where the angle between its normal and  0 is larger than a certain threshold, we apply a RANSAC-based approach to refine the up-right vector   by iteratively selecting two points   and   from the façade points and estimating the cross product of    and    : Then, the visibility of each façade point is checked by using the angle between the normal vector and the image pointing vector, illustrated in Figure 4.The scale of ground image point cloud is changed to be similar to that of the LiDAR point cloud after coarse integration (shown in Figure 5) so that the accurate registration in the next step can converge.Accurate Integration by Registering Building Outlines.In the previous step, the ground image point cloud has been approximately matched with the LiDAR point cloud and converted into the world coordinate system by registering the geo-tags of images (global coordination) and the camera positions from SfM (local coordination).But due to deficient accuracy of GPS from smartphones, the matching is not accurate enough to meet the need of building reconstruction.
In order to accurately integrate the LiDAR point cloud and ground image point cloud, we adopt the Coherent Drift Point (CPD) algorithm (Myronenko and Song, 2010) to match building outlines extracted respectively from LiDAR point cloud and ground image point cloud.CPD represents the centroids of one point set using a Gaussian mixture model and aligns them with another set of data points.It allows points to displace at all degrees of freedom, and makes the displacement vectors point to the similar directions by regularizing high frequency components of the displacement field.
The two point clouds are accurately registered in 2D space after matching 2D outlines using the CPD algorithm, however, elevation is missing during this process.In order to accurately register LiDAR point cloud and ground image point cloud in 3D space, the median point of building height is calculated respectively from LiDAR point cloud and ground image point cloud.Then, the two point clouds are matched on the z axis by aligning the median points.Due to the incompleteness of the point clouds, errors may exist in the calculated median points and previous matching on the z axis might not be ignorable.When this happens, we manually move the ground image point cloud on the z axis at a distance to make sure the two point clouds integrated at a proper height.After that, the LiDAR point cloud and ground image point cloud are finally integrated, as illustrated in Figure 6.

EXPERIMENTS AND RESULTS
The test datasets were processed with the methods described in the previous section.Several different buildings on Wuhan University campus are reconstructed.Integrated results from two of the buildings will be presented.Provided with global coordinate information, the original airborne LiDAR data cover about 4 square kilometres of the campus with a point density of 10 pts/m 2 .A total of 582 smartphone images having GPS coordinate and digital compass data are captured in the test area.Through clustering analysis, 78 are clustered for a student dorm and 93 for an instruction building.Ground image point clouds are generated by the clustered image sets after geometric and radiometric calibrations, as shown in Figure 7.A RANSAC-based method is utilized to acquire the up-right direction vector of the ground image point cloud by using the normal of façade points.Then, 2D building outlines can be obtained by projecting the adjusted point cloud to the horizontal plane.By using the image geo-tags, individual building LiDAR point clouds of the student dorm and the instruction building can be extracted by the method described in section 3.2, as illuminated in Figure 8.In the next step, the building LiDAR point cloud is classified to rooftops and façades.Visible 2D outlines of buildings are extracted by projecting the façade point cloud to the horizontal plane and check the visibility of façade points using the camera pointing data and camera FOV information, shown in Figure 9.

CONCLUSIONS
This paper has presented a novel method for complete reconstruction of both façades and rooftops of buildings by integrating ground smartphone images and airborne LiDAR data.Building outlines respectively from the LiDAR point cloud and the ground image point cloud are used to match the two point clouds, leading to a complete and full resolution building model.Taking full advantage of geo-tags and digital compass information obtained from smartphone images, the developed approach, which is highly efficient and low-cost, overcomes the issue of sparsity of points on façades in airborne LiDAR data and the deficiency of rooftops in ground images.Test buildings reconstructed by our method are much higher in completeness and finer in detail than using airborne LiDAR data alone.Most steps in the proposed method can be implemented automatically.Since buildings are reconstructed one by one in this method, building reconstruction for a large area can be achieved by performing a similar process to all individual buildings.In future work, we intend to acquire more images of buildings in the test area and achieve large scale reconstruction.
Our methods achieve desired results on buildings with complex outlines.But it may not be effective for compact buildings with a matchbox shape outlines due to lack of junction points for outlines matching.Due to the lack of building ground truth, no effective method is found to accurately evaluate the integrated results at this time.Most steps in our methods are automated except for the step of recovering elevation of the buildings.For future work, we plan to automatically recover elevation by matching line features extracted respectively from LiDAR data and images.Building model reconstruction from the integrated point cloud is also the issue that should be addressed in the next step.

Figure 2 .
Figure 2. Extracting LiDAR building point clouds corresponding to ground image point cloud.(a) buffer zone of possible building LiDAR points from GPS position of a camera.(b) top view of elevation coloured LiDAR point cloud in the background.Grey translucent region around the target building (yellow region at the centre) shows the overlay buffer zones of all cameras.(c) extracted LiDAR building point cloud.
Figure 3. Extracting building outlines of ground image point cloud (top) and LiDAR point cloud (bottom).Extracting Visible Part of 2D Building Outlines.Ground image point cloud covers only the visible part of building façades from camera positions.Mismatch of 2D building outlines extracted from the two point clouds may occur due to repetitive structures of the building if the entire 2D building outlines from the LiDAR point cloud are involved in the matching process.In order to work around this problem, image pointing information, normal of façade points and Field of View (FOV) are utilized to extract the visible part of 2D building outlines from LiDAR data.First, potentially visible points in outlines are extracted by searching the area in the FOV from individual camera position.Then, the visibility of each façade point is checked by using the angle between the normal vector and the image pointing vector, illustrated in Figure4.

Figure 4 .
Figure 4. Extraction of visible part of the building outlines by using image pointing information, normal and FOV.Red points show the visible part corresponding to cameras and blue points show the invisible part.3.4Integration of LiDAR and Ground Image Point CloudsOnce the building outlines of the two point clouds are matched, LiDAR point cloud and ground image point cloud can be matched by the same transform.In order to precisely register the two 2D building outlines generated in different coordinates systems, we propose a two-step solution.Coarse Integration by Matching Camera Positions.Two sets of camera position data have been acquired: A. GPS data of camera positions from Exif tags; B. Camera positions from SfM result.Although the former is erroneous about 10 meters, it is sufficient for coarse registration of the two point sets.The combination can be combined by a 7-parameter similarity transform  = (, , ).An iterative procedure is adopted to minimize the errors between the two point sets:

Figure 5 .
Figure 5. Coarse integration of LiDAR point cloud (elevation coloured in the background) and ground image point cloud (blue dots) by matching camera positions respectively from SfM (black dots) result and GPS data (red dots).
Accurate integration of LiDAR and ground image point clouds by matching building outlines.(a) LiDAR point cloud.(b) ground image point cloud.(c) top view of integrated result.(d) forward view of integrated result.
Clustered building images and ground image point clouds for a student dorm (top, 93 images) and an instruction building (bottom, 78 images).(a) clustered building images; (b) ground image point clouds.
Airborne LiDAR data of the test area.(a) is part of the study area.(b) is the point clouds for two buildings.Top: the student dorm, bottom: the instruction building.In order to solve the problem of different scales in the two types of datasets, a 7-parameter similarity transform is utilized to approximately convert them into the same coordinate system by matching camera positions from geo-tags and SfM results.Next, outlines are accurately registered using the CPD algorithm.Using the transform matrix generated by the CPD algorithm in the previous step, LiDAR point cloud and ground image point cloud can be matched in 2D space.After matching the median height point of the two point clouds and manually moving ground image point cloud along z axis to a proper height, point clouds are finally integrated in 3D space.Integrated results of the test buildings are shown in Figure 10.
Two visible building outlines extracted from airborne LiDAR (top) and ground image point clouds (bottom).(a) (b) Figure 10.Integrated point clouds.(a) the two buildings within airborne LiDAR point cloud.(b) the student dorm (top) and instruction building (bottom).