EXTRACTION OF HOUSES FROM POINT CLOUD LIDAR: PROBLEMS AND CHALLENGE

: Although many efforts have been made on the extraction of houses from LiDAR (Light Detection and Ranging) and/or aerial imagery and/or their fusion, little investigation using co-registration between the orthoimage map and LiDAR on the basis of geodetic coordinates as element for house extraction. For this reason, this paper first overviews the advances of LiDAR and investigates the advantages and disadvantages of LiDAR system vs. traditional photogrammetry, and then indicates that LiDAR technology has not yet resolved all existing problems that traditional photogrammetry remained so far, such as texture information, LiDAR point cloud density. A comprehensive comparison in extraction of houses (feature information) from LiDAR data set and from aerial imagery are also presented. It has been widely accepted and admitted that full automation for extraction of houses (feature information in city area) from LiDAR point cloud has still been difficult. Therefore, this paper proposes a human-computer interaction operation for houses extraction through combination of LiDAR point cloud and the orthorectified high-resolution aerial imagery. The real data is utilized for validation of the proposed method.


INTRODUCTION
Extraction of houses from airborne-based aerial imagery, or spaceborne-based high-resolution satellite imagery to create digital building model (DBM) have investigated for more than 100 years. I recalled that when I was in the fresh year of the undergraduate student in 1980s, the lectures in university told us that the AUTOMATIC extraction of houses/buildings from high-resolution aerial images have been a very hot topic of research worldwide. This is because an increasing need for continuously updating urban three-dimensional (3D) DBM in rapidly change city area, especially in the most current digital/smart city construction in China and the a variety of applications such as microclimate investigation of city streets, transmitter placement in telecommunication in high-density city area, noise simulation for industrial city area using 3D DBM, heat and exhaust spreading with different materials in big cities, traffic monitoring using GPS-based car track real data, and security surveillances at night, etc. However, traditional photogrammetry method has encountered many challenges for generations of DBM, digital surface model (DSM), or digital terrain model (DTM) in those complicated city scenes, such as dense buildings, very-high buildings. The degradation in the performance of photogrammetric automatic processes is mainly due to the failures of image matching, which are primarily caused by, for example, occlusions, depth discontinuities, shadows, poor or repeated textures, poor image quality, foreshortening and motion artifacts, and the lack of model of man-made objects. Therefore, a human-guided interactive operations, such as stereo compilation on screen in traditional photogrammetry are still be applicable.
However, the emerging LiDAR (Light Detection And Ranging) in the end of 1980s and the beginning of 1990s had been considered as the revolution of photogrammetric stereo matching, since the LiDAR directly obtains the 3D point cloud data without conducting a stereo matching. The LiDAR technology become thereby very doable for 3D DBM generation in city area with highly dense buildings. Consequently, a variety of different methods and algorithms in photogrammetric community have been proposed for extraction of houses over last decades of years. For example, early investigations in 1990s include, but are not limited to, Axelsson (1999), Baltsavias et al. (1999), Hug (1997), Haala et al. (1998) , Lindenberger (1993 and Wehr and Lohr (1999); and in the beginning of 2000s, such as Morgan and Tempfli (2000); Morgan and Habib(2002), Vosselman (2000), Wang et al. (2009), Axelsson (2000, Sithole and Vosselman (2004), Yoon et al. (2002). These methods and algorithms can be categorized into two groups (Zhou et al. 2004): the classification approach and the adjustment approach. The classification approach is used to detect the ground points using various operators, typically, mathematical morphology (e.g., Morgan and Tempfli 2000), terrain slope (Axelsson 1999), or local elevation difference. The adjustment approach essentially uses a mathematical function to approximate the ground surface. This method is determined with an iterative least-squares process, with which the outliers of non-ground points are eliminated, typically including Pu and Vosselman (2009), Rutzinger et al. (2009), Zhang et al. (2006, and Sampath and Shan (2007). The discussions can be referenced to Zhou et al. (2014; In addition, the combination of LiDAR data and high-resolution aerial images has also been studied for extraction of houses in the past decades of years duo to their complementary properties of each data source. For instance, Zhou et al. (2004;2014), Hermosilla et al. (2011), Kabolizade et al. (2010), Hu et al. (2007), Gamba et al. (2002) and Yu et al. (2009) presented the method for extraction of urban houses and road networks through the integration of LiDAR and aerial images. Schenk and Csatho (2002) and Habib et al. (2005) proposed featurebased data fusion of LiDAR point cloud data and highresolution aerial imagery. Sohn and Dowman (2007) focused on the exploitation of IKONOS multi-spectral imagery in combination with LiDAR DEM for house extraction. Rottensteiner et al. (2007) proposed a method to detect building roof and determine the roof boundaries. Fujii and Arikawa (2002) proposed integrating LiDAR, aerial image and ground images for modeling the urban building. O'Donohue et al. (2008) combined thermal-LiDAR imagery for the extraction of urban man-made objects. Zabuawala et al. (2009), Wang and Neumann (2009), Mastin et al. (2009), Dong et al. (2008 suggested automatic registration of LiDAR and optical imagery for the extraction of urban houses and roads.
Since the original aerial images have no geodetic coordinates, contain various types of geometric distortions. As a result, the LiDAR point cloud data is difficult to co-register with aerial image. Thus, this paper proposes the original aerial images are first orthorectified to a given geodetic coordinate system and then co-register with LiDAR point cloud data. After that, the extraction of houses is conducted from the combined data set.

Traditional Photogrammetric Technologic Challenges
As mentioned above, traditional photogrammetry is an important tool for generation of digital building model (DBM), digital surface model (DSM), or digital terrain model (DTM). Moreover, the performance of these systems is very good, efficient and cost effective in smooth terrain with small and medium scale imagery (Baillard, 1999;Simonetto et al., 2005). However, it decreases rapidly in complicated urban areas with dense buildings. This is because traditional photogrammetric method using stereo matching, which exposes the challenging as follows (see Figure 1).  Occlusions  Depth discontinuities  Shadows  Poor or repeated textures  Poor image quality  Foreshortening and motion artifacts  Lack of model of man-made objects These challenges above cannot be resolved using the traditional stereo match if no human-computer interaction is conducted. Thus new, novel, even revolution technologies have to be hunted for.

LiDAR Technologic Challenges
LiDAR technology emerged in the end of 1980s was majorly applied for 3D point cloud data collection. From a viewpoint of photogrammetry, the LiDAR system did resolve a few inherit problems that photogrammetry have encountered, but cannot effectively resolved for decades of years. The two typical terrestrial scanning LiDAR systems are "Z" and "Cone" scanning styles (see Figure 2). The traditional scanning LiDAR systems are not suitable for rapid 3D terrain generation at a real-time manner, i.e., DTM generation, duo to its clumsy heavy weight, big volume, and delay for post-processing for 3D point cloud data (Zhou et al. 2015;2014). For this reason, 3D flash laser sensor (also called "array LiDAR system") has been developed in recent years. The array LiDAR sensor is analogous to a camera with a flashbulb (flood illumination) (see Fig. 3), but with the flash being provided by laser illumination and the use of an array detector, such as array APD detector with a clock to determine the time it takes from emitting to retrieving. The flash Lidar systems are being widely applied in different fields such as terrain mapping, autonomous safe landing, Drogue tracking, obstacle avoidance, range navigation, ocean wave reconstruction, and environmental monitoring, etc. (Zhou et al. 2015).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China

Comparison Analysis between LiDAR point cloud and aerial image
Although LiDAR system has really successfully resolved a few inherit problems that photogrammetry have encountered, but cannot effectively provide rich texture information, i.e., the new problem has emerged with LiDAR data. Firstly, the LiDAR point cloud cannot reach the same density as the aerial photogrammetry does (e.g., 25 cm ground sample distance, GSD) (see Figure 4); secondly, the LiDAR cannot provide as rich texture information as the aerial photogrammetry does (see Figure 5); and last, the LiDAR system sometime received multiechoes in forest area (e.g., trees), result in that it is hard to conduct post-processing at a near real-time mode (Baltsavias, 1999). In addition to the differences of the two data sets themselves, the post-processing is still different. For example, the operations for DSM, DTM and DBM generations from both aerial imagery and LiDAR point cloud need human-computer interaction, this means that it is rather difficult to implement a full automation for production of DBM (see Table 1). Also, the extraction of feature information from both aerial imagery and LiDAR point cloud need human-computer interaction, that means that it is rather difficult to implement a full automation for extraction of feature information in city area (see Table 2).

True Orthoimage Generation
In order to orthorectify the relief displacement caused by high of the buildings onto its correct, upright, true positions using differential orthorectification method, the buildings must be modelled using 3D. In addition, the relief displacement caused by terrain also must be orthorectified, which implies terrestrial surface must be represented using digital terrain mode (DTM), which does not contain buildings and vegetation. Therefore, the basic steps of orthoimagery generation in city area include both DTM-based orthoimagery generation and DSM-based orthoimagery generation, their merging, as well as other works, such as occlusions detection and compensation, and shadow detection and removal (Zhou et al. 2004;2014).

True Orthoimagery Generation
 DBM-Based orthoimagery generation: The generation of DBM−based orthoimagery uses DBM data to orthorectify only the displacement caused by buildings. The detailed method can be referenced to Zhou et al. (2004;. Because of the existence of building occlusion in city area, this means that areas occluded by objects need to be filled by using other orthoimagery, thereby, this the orthorectified imagery is called slave orthoimagery.
 DTM−based orthoimagery generation: The generation of DTM−based orthoimagery is for orthorectifying the relief displacement caused only by terrain without considering the displacement caused by buildings. The detailed method can be referenced to Zhou et all (2004;.  Occlusion detection and compensation: In highly density urban area, the tall buildings usually occlude low buildings and terrain. To solve this problem, Zhou et al. (2004; suggested a method using cross-strip imagery and alongstrip imagery with over 75% overlap. The process of occlusion compensation requires finding conjugate areas in adjacent slave orthoimages and then filling the occluded area using orthoimagery patches. The detailed method can be referenced to Zhou et al. (2004;2017).

Co-registration between Orthoimagery and LiDAR Point Cloud
The orthoimages are an orthorectified image in a given geodetic coordinate system, and the LiDAR point cloud is referenced to the WGS84 coordinate system. This means that datums of two data sets are different. For this reason, WGS84 is selected as the datum and the orthoimages are unified to the same datum. The linkage of the two data sets is implemented by the XY coordinates. The validation of co-registration is conducted by visual check. The details of this method can be referenced to Zhou et al. (2004).

Extraction of Initial Edges of Houses from LiDAR Point Cloud
This paper applies the segmentation method of the LiDAR point cloud. The detailed steps are: Firstly, separating the ground and objects above earth surface using watershed algorithm: watershed algorithm belongs to one type of mathematical morphology algorithms, since this method is similar to the simulated water immersion process. This method visualizes the LiDAR point cloud data using gray values with a raster image format. The brightness of each pixel is corresponding to the height. The gray value is enhanced and filtered to enlarge the difference between the ground and object on the earth surface.
Secondly, distinguishing the trees and houses: a fourth-order polynomial equation fitting method is used for distinguish the trees and houses. First, using the dynamic size circular filter to detect the crown apex, and then fourth-order polynomial equation is applied to fit each of tree on the basis of tree shape structure.
Thirdly, extracting vegetation using the normalized difference (ND) of elevation. Two echoes, which are corresponding two elevations, are normalized to the range of [-1,1]. The ND values can be calculated by where F DSM is the elevation from first echo, and L DSM is the elevation from the last echo. DSM, standing for Digital Surface Model, means that the elevations at the first and last echoes, are from the DSM.
Fourthly, discriminating the houses and vegetation. Since both edge of the building and vegetations possibly have two echoes, resulting the height difference of the same object, the slope information and intensity information are combined to distinguish the vegetation and building edge information. The algorithms where 1 2 0, 1,

Slope T and Intensity T t Others
Slope represents the terrestrial slope located in the point of the DSM, and Intensity represents is the gray value, and 1 T and 2 T are the thresholds corresponding to the Slope and the Intensity .

High-accuracy of House Extraction for DBM Creation
With the initial edges of the houses, 4 corner coordinates of a house can be determined, noted as After the straight line equation is determined, whether are the LiDAR point cloud data located either inside or outside can be determined as well.
The above procedure is then repeated for each building until all houses are finished.
After extraction of each of roof, the LiDAR footprint within the building roof are obtained, associated with boundary information. Zhou et al. (2004) suggested an innovate method for extraction of high accuracy houses through fitting the building using planar equation in accordance with LiDAR footpoints within building roof's boundary. The planar equation is Where , A B and C are unknown parameters, , X Y and Z are coordinates of LiDAR data. As observed from Eq. (4), only three LiDAR points can determine the , A B and C are The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China unknown parameters, which determines the surface of building. The details can be referenced to Zhou et al. (2004).

Data Set
The details of the experimental field can be referenced to Zhou et al. (2004;. In summary, the Virginia Department of Transportation (VDOT), contracting to Woolpert L.L.C. at Richmond, Virginia, has established a high-accuracy test field in Wytheville, Virginia for the accuracy evaluation of the LiDAR system. The field is approximately 11.4 miles (West-East) × 4.5 miles (North-South) ( Figure 6). The 19 GCP's accuracy averages 0.02m, 0.02m, and 0.01m in X, Y and Z, respectively.

Extraction of Houses
Step 1: Co-registration between orthoimagery and LiDAR point cloud data. Figure 7 is the result for co-registration between orthoimagery and LiDARdata point using XY geodetic coordinates of the 6 conjugate points. Step 2: Extraction of initial edges of houses from LiDAR point cloud. The coarse boundary of houses can be extracted from LiDAR point cloud using the algorithm described in Section 3.2.2. The result is depicted in Figure 8. In order to compare the difference, a human-computer interaction is utilized for house extraction. The result is shown in Figure 9. That is Figure 8 is the result of automatic detection of building edge, and Figure 9 is detected buildings after human-computer interactive operation.  Step 3: Creation of accurate 3D model of houses. With the accurate boundary of house, the house 3D model can be created using the method proposed in Section 3.2.3. The major difference between this Step and Step 2 is that boundary information, which is used to fit Eq. (4) for solution of the three unknown parameters. Thus, the boundary of the 3D model is trimmed for creation of an accurate DBM. Repeating the above steps, all houses are modelled ( Figure 10).

CONCLUSION
This paper first overview the advantages and disadvantages of LiDAR and photogrammetry technologies in the creation of DSM, DBM and DTM. It is widely accepted and admitted that a human-computer interactive operation for house extraction is necessary either from LiDAR point cloud data or from highresolution aerial imager. The main contributions of this paper is proposing the combination of the orthorectified aerial imagery (high-resolution orthoimagery) and LiDAR point cloud for DBM generation. In this algorithm, the roof types and surface LiDAR footprints, etc. are described, including the roof surface's boundary and their planar equations.
The experimental field located in Wytheville, Virginia of USA, is used to evaluate the proposed method. The experimental results demonstrated that the proposed method in this paper is capable of effectively extracting houses.