QUALITY ASSESSMENT AND COMPARISON OF SMARTPHONE AND LEICA C10 LASER SCANNER BASED POINT CLOUDS

3D urban models are valuable for urban map generation, environment monitoring, safety planning and educational purposes. For 3D measurement of urban structures, generally airborne laser scanning sensors or multi-view satellite images are used as a data source. However, close-range sensors (such as terrestrial laser scanners) and low cost cameras (which can generate point clouds based on photogrammetry) can provide denser sampling of 3D surface geometry. Unfortunately, terrestrial laser scanning sensors are expensive and trained persons are needed to use them for point cloud acquisition. A potential effective 3D modelling can be generated based on a low cost smartphone sensor. Herein, we show examples of using smartphone camera images to generate 3D models of urban structures. We compare a smartphone based 3D model of an example structure with a terrestrial laser scanning point cloud of the structure. This comparison gives us opportunity to discuss the differences in terms of geometrical correctness, as well as the advantages, disadvantages and limitations in data acquisition and processing. We also discuss how smartphone based point clouds can help to solve further problems with 3D urban model generation in a practical way. We show that terrestrial laser scanning point clouds which do not have color information can be colored using smartphones. The experiments, discussions and scientific findings might be insightful for the future studies in fast, easy and low-cost 3D urban model generation field.


INTRODUCTION
Low-cost sensors are potentially an important source for automatic and instant generation of 3D models which can be useful for quick 3D urban model updating. However, the quality of the models is questionable and the user may not have a good opinion about how to collect images for the best 3D modelling results.
In this paper, we evaluate the reliability of point clouds generated automatically by multi-view photogrammetry applied on smartphone camera images. Next, we show how to align uncalibrated smartphone based point clouds with laser scanning point clouds for comparison. We also discuss further applications where smartphone based point clouds can be useful in terms of time, budget and man effort efficiency.

RELATED LITERATURE
Modelling 3D urban structures gained popularity in urban monitoring, safety, planning, entertainment and commercial applications. 3D models are valuable especially for simulations. Most of the time models are generated from airborne or satellite sensors and the representations are improved by texture mapping. This mapping is mostly done using optical aerial or satellite images and texture mapping is applied onto 3D models of the scene. One of the traditional solutions for local 3D data capturing is the use of a Terrestrial Laser Scanner (TLS). Unfortunately, these devices * Corresponding author are often very expensive, require careful handling by experts and complex calibration procedures and they are designed for a restricted depth range only. On the other hand, high sampling rates with millimetre accuracy in depth and location makes TLS data a quite reliable source for acquiring measurements. Therefore, herein we use TLS data as reference to evaluate the accuracy of the iPhone point cloud. An overview and the major differences between TLS and multi-view 3D model generation technologies are explained by Baltsavias (1999).
In last years, there has been a considerable amount of research on 3D modelling of urban structures. Liu et al. Liu et al. (2006) applied structure-from-motion (SFM) to a collection of photographs to infer a sparse set of 3D points, and furthermore they performed 2D to 3D registration by using camera parameters and photogrammetry techniques. Another work by Zhao et al. Zhao et al. (2004) introduced stereo vision techniques to infer 3D structure from video sequences, followed by 3D-3D registration with the iterative closest point (ICP) algorithm. Some of the significant studies in this field focused on the alignment work Huttenlocher and Ullman (1990) and the viewpoint consistency constraint Lowe (2004). Those traditional methods assume a clean, correct 3D model with known contours that produce edges when projected. 2D shape to image matching is another well-explored topic in literature. The most popular methods include chamfer matching, Hausdorff matching introduced by Huttenlocher et al. (1993) and shape context matching as introduced by Belongie et al. Belongie and Malik (2002). Koch et al. (1998) reconstructed outdoor objects in 3D by using multi-view images without calibrating the camera. Wang (2012) proposed a semi-automatic algo-rithm to reconstruct 3D building models by using images taken from smart phones with GPS and g-sensor (accelerometer) information. Fritsch et al. (2011) used a similar idea for 3D reconstruction of historical buildings. They used multi-view smart phone images with 3D position and G-sensor information to reconstruct building facades. Bach and Daniel (2011) used iPhone images to generate 3D models. To do so, they also used multiview images. They extracted building corners and edges which are used for registration and depth estimation purposes between images. After estimating the 3D building model, they have chosen one of the images for each facade with the best looking angle and they have registered that image on the 3D model for texturing it. They have provided an opportunity to the user to select their accurate image acquisition positions on a satellite map since iPhone GPS data does not always provide very accurate positioning information. Heidari et al. (2013) proposed an object tracking method using the iPhone 4 camera sensor. These studies show the usability of iPhone images for feature extraction and matching purposes which is also one of the important steps of 3D depth measurement from multi-view images. On the other hand there are some disadvantages. Unfortunately, videos generally have higher compression effects, besides they might contain errors because of the rolling shutter. In order to cope with these physical challenges, Klein and Murray (2009) applied a rolling shutter compensation algorithm at the Bundle adjustment stage. In order to test effect of the camera sensor on the point cloud reconstruction accuracy, Thoeni et al. Thoeni et al. (2014) reconstructed a rock wall using five different cameras of different quality and compared the reconstructed point clouds with TLS scanning of the same wall. Besides, discussing performances of different cameras, they also concluded that having multi-view images orthogonal to the object of interest increases the accuracy of the point cloud generation process.
When the point clouds are generated from multi-view images based on photogrammetry, illumination conditions play a crucial role for dense and accurate reconstruction. In order to make the reconstruction approaches independent from illumination, Laffont et al. (2013) proposed a 3D multi-view imaging based point cloud reconstruction method using images separated into reflectance and illumination components. The method also showed successful results to remove self-shadows of urban structures which reduced the reconstruction error.
In order to increase the quality of the urban structure point cloud, researchers developed intelligent post-processing techniques to apply on the generated point cloud. Friedman and Stamos (2012) introduced a method to fill the gaps in the point clouds which are caused by occluding objects. The proposed approach fills the point cloud gaps successfully if the building facades have regular repetitive patterns. Turner and Zakhor (2012) proposed a sharp hole filling method under the assumption that buildings are composed of axis-aligned 3D rectilinear structures. They separated the point clouds into planer surfaces and fit a plane on each point cloud group. Then, the missing points are obtained by interpolation and sharpened by resampling the new points considering the planer surfaces.
We see that smartphone camera based point clouds of the urban structures need to be validated by comparing their geometrical properties with airborne and terrestrial laser scanning point clouds. Therefore, our paper focuses on this comparison, as well as the other extra benefits that smartphone camera based point clouds can provide.

POINT CLOUD ACQUISITION
For smartphone point cloud generation and laser point cloud acquisition, we have selected a windmill as an example urban structure. Windmill has specific challenges compared to many other regular buildings. The windmill structure does not have flat walls and the top part of the mill rotates while it is being used. The example windmill is a historical structure in Delft, the Netherlands. It is 3D model generation is important to keep it as a documentation and also for monitoring 3D changes like possible deformations and damages of this old structure.
The Leica C10 Scan Station scanner is a time-of-flight scanner with an effective operating range of +/-1-200 m (up to 300 m with 90% reflectivity). It's motorized head allows scanning of a complete 360 degree by 270 degree area. Data is acquired at a rate of 50,000 points/second and can be stored on-board or on a wireless or wired laptop. The C10 has a number of features which make it particularly effective. It has an on-board camera which can provide images to be automatically aligned with the scans to texture the point clouds. The published datasheet (2011) specifications indicate that the accuracy of a single measurement is 6 mm in position and 4 mm in depth (at ranges up to 50 m). A windmill is scanned by the Leica C10 laser scanner in order to discuss the accuracy of the smartphone based point cloud. The Leica C10 laser scanner and the generated point cloud from a single scan station are presented in Figure 1.
Although Leica C10 laser scanner provides very high density and very high quality point clouds, the practical difficulties of transporting the scanner to different stations make it challenging to use in every day life, especially when the point clouds need to be generated urgently. Besides, the device must be used by a trained person who has experience with target installation, data acquisition and post-registration steps. A smartphone based point cloud generation process provides practical advantages to overcome many of those difficulties. There are of course new laser scanning devices, such as hand-held laser scanners which might be physically more practical to use. However, as introduced by Sirmacek et al. (2016), point clouds acquired by these scanners need to be sent to an paid-online service to be post-processed and be ready to be used.
Another issue with Leica C10 laser scanner point clouds is that the acquired point cloud does not contain color information of points. Some laser scanners do not have option for providing color registration on the point clouds. For Leica C10 laser scanner this is not the case. The scanner has capability to register color information to points. However, choosing this coloring option increases the point cloud acquisition time immensely. When exteriors of the urban structures need to be scanned with a laser scanner, most of the time the scan station must be set in a public place which is most of the time a busy area that cannot be occupied for more than few minutes. This issue creates challenges for color information registered point cloud generation. The easiest and the fastest way is to generate point clouds without color information, was the case for the windmill point cloud represented in Figure 1.
After cutting only the windmill from the original point cloud, the laser scanning point cloud -which can be scene in Figure 2 together with the iPhone point cloud-consists of 160, 501 points where each point has only (x, y, z) attributes.
The iPhone point cloud is generated using multi-view images taken by iPhone 3GS smartphone sensor. The algorithm starts by extracting local features of each input image, as introduced The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLI-B5, 2016 XXIII ISPRS Congress, 12-19 July 2016, Prague, Czech Republic by Lowe (2004). The smallest Euclidean distances between the descriptor vectors are considered for matching local features of overlapping input images. After local feature matching process, the relative rotation, translation and position matrices are calculated and these matrices are used as input for the structure from motion (SfM) algorithm represented by Hartley (1993), in order to estimate the internal and external camera parameters. These are used for initializing the bundle adjustment algorithm which helps us by calculating the complete 3D point cloud. Figure 2 shows some of the input images and the resulting point cloud. In total 50 iPhone images are used for point cloud generation. The resulting point cloud consists of 161, 524 points where each point is represented by (x, y, z, r, g, b) attributes.

ASSESSMENT OF THE SMARTPHONE AND LASER SCANNING POINT CLOUDS
In order to compare the smartphone based point cloud with the terrestrial laser scanning point cloud, the point clouds must be aligned. The major problem with smartphone based point clouds is that they are not calibrated. Their scale, rotation and location in 3D space does not show realistic information. On the other hand, our terrestrial laser scanning point cloud shows the 3D distance relationships of the scanned object sampling points in very high accuracy.
The alignment of smartphone point cloud on the laser scanning point cloud is performed in two steps; (1) a course re-scaling and alignment is performed by applying a transformation function to the smartphone point cloud, based on manual tie point selection between the smartphone and laser scanning point clouds, (2) Fine alignment of the smartphone point cloud on the laser scanning point cloud is performed using the Iterative Closest Points (ICP) approach.
One challenge with the example structure is that, since it is a functioning windmill, its top mill rotates. Therefore, the time difference between the smartphone point cloud generation and laser scanning, causes different positions of the moving parts. One part which is sure to be found stable in different data is the balcony of the windmill structure. Therefore, for the smartphone point cloud accuracy assessment, we compare the geometry of the point clouds only for the balcony area. Figure 3 shows the balcony of the smartphone point cloud and its fine alignment on the laser scanning point cloud.
In Figure 4, we show some screenshots of the 3D geometry comparison of the smartphone and laser scanner point clouds. The top row shows the comparison of the 3D distance between the same reference points. A distance of 4.489meters in the laser scanning point cloud is measured as 4.317meters in the smartphone based point cloud based on the laser scanning point cloud measurements taken as reference. This shows that the measurement is done with 3.8% error with smartphone. The bottom row of the Figure 4, shows that the balcony edge angle which is 146.833 degrees in laser scanning data is measured as 149.345 degrees in the smartphone point cloud. This shows that the angle measurement is done with 1.7% error when smartphone is used. The results show the very low 3D geometrical error values of the smartphone based point cloud.
After the two alignment steps are performed, on the intersection  Figure 5 (based on comparison of 19163 points in the intersection area). Mean error of the smartphone based point cloud is calculated as 0.16 meters. This is mainly because of the thin line like structure details of the balcony which could not be modelled with smartphone.

USING SMARTPHONE DATA FOR COLORING LASER SCANNING POINT CLOUD
As we mentioned earlier, color registration to the laser scanning point clouds might require very long data acquisition time which is most of the time undesirable. For this reason, most of the time it is preferred to acquire point clouds without color information. However, coloring point clouds is very important for representing realistic and more informative 3D rendering. In this point, smartphone point clouds might be very helpful to assign color information to the laser scanning point clouds. To do so, herein we propose a new technique for coloring laser scanning point clouds based on smartphone data.
The point cloud coloring approach works as follows. For each point in the laser scanning point cloud, the closest three points of the smartphone point cloud is determined. The mean of red, green and blue attributes of these three smartphone point cloud are computed. Finally, they are assigned as attribute values to the laser scanning point sample. Figure 6 shows the laser scanning point cloud after its colored with smartphone data.

CONCLUSION
3D reconstruction of urban structures is crucial for many field such as 3D urban map generation, observing changes in three Figure 6: Leica C10 laser scanner point cloud colored using smartphone data.
dimensions, developing 3D entertainment, engineering applications, disaster preparation simulations, storing detailed information about cultural heritage. Point clouds are developing towards a standard product for 3D model reconstruction in urban management field. Still, outdoor point cloud acquisition with active sensors is a relatively expensive and involved process. Generation of point clouds using smartphone sensors could be a rapid, cheap and less involved alternative for local point cloud generation, that could be applied for 3D archive updating. Besides, accuracy computation for practical point cloud generation, we present a method to add color information to existing point clouds with smartphone sensors.