AIRBORNE LIDAR POINT CLOUD CLASSIFICATION FUSION WITH DIM POINT CLOUD

Airborne Light Detection And Ranging (LiDAR) point clouds and images data fusion have been widely studied. However, with recent developments in photogrammetric technology, images can now provide dense image matching (DIM) point clouds. To make use of such DIM points, a sample selection framework is introduced. That is, first, the geometric features of LiDAR points and DIM points are extracted. Each feature per point is considered a sample. Then we extend the binary TrAdaboost classifier into a multiclass one to train all the samples. The classifier automatically assigns weights to the samples in the DIM points. The useful samples are assigned large weights and consequently impact the classification results largely and vice versa. As a result, the useful samples of the DIM points are kept to improve on the LiDAR points classification performance. Because only the samples are used, the registration between the DIM points and LiDAR points is not needed. Moreover, the DIM points capturing similar classes but not the same scene as the LiDAR points can also be used. By our framework, existing aerial images can be fully utilized. For testing the generation ability, the framework is applied in a super-voxel-based classification approach by replacing the points-based features with the super-voxel-based features. In the experiments, whether DIM points at the same places as those of LiDAR are used or not, the results after fusion show that, the LiDAR points classification performance has improved. Also, the better the quality of DIM points are, the better the classification performance achieves. * Corresponding author


INTRODUCTION
Airborne LiDAR can directly obtain high-precision, highdensity 3D coordinates, which is widely used for applications like 3D reconstruction and scene understanding (Sohn et al., 2008;Sampath et al., 2010), in which point cloud classification plays a critical role.
Many methods have been proposed for point cloud classification, ranging from point-based classification to objectbased classification. For point-based classification, Sun et al. (2014) extracted geometric features from point clouds and classified them using a random forest classifier (Fan et al., 2013). Niemeyer et al. (2013) integrated a Random Forest (RF) classifier and a Conditional Random Field (CRF) (Niemeyer et al., 2012) for multi-class classification. Due to the high level of noise in point clouds, the point-based classification results often have a "pepper and salt" phenomenon. Therefore, many studies clustered the point clouds into objects. Kang et al. (2018a) voxelized a scene of point clouds and extract features from the voxels to recognize pole-like objects. Huang et al. (2016) turned the point clouds into 3D voxels and use a 3D Convolutional Neural Network (CNN) for classification. Ramiya et al. (2016) used a super-voxel for point clouds classification and showed that the super-voxel can improve the computational efficiency of dense point clouds. Ahmad et al. (2014) segmented point cloud into super-voxel by a link-chain method and used geometrical models and local descriptors of the super-voxel for classification. Zhang et al. (2016) divided the point cloud into hierarchical point clusters and used the sparse coding model (Yang et al., 2009) and latent Dirichlet allocation model (Blei et al., 2003) to extract and encode the shape features of the hierarchical point clusters for classification.
To further improve classification performance, the fusion of optical images and point clouds has been given wide attention. Haala et al. (1999) combined multispectral images and laser altimeter data in an integrated classification for the extraction of buildings, trees, and grass-covered areas. Cao et al. (2012) fused point clouds with its co-registered images (i.e. aerial color images containing red, green and blue (RGB) bands and nearinfrared (NIR) images) and other derived features for accurate urban land-cover classification. Guo et al. (2011) proposed a multi-source framework by combining multi-echo LiDAR data, full-waveform LiDAR data and multispectral image data to classify dense urban scenes.
Although a lot of research has been done on the classification of point cloud by fusing images and points, there is still room for improvement. Most of these studies focus on the spectral information in the images. However, currently, a large number of airborne images or especially the unmanned aerial vehicle (UAV) images can provide DIM point clouds via photogrammetric means (Rosnell et al., 2012). The DIM points have also been used for ground objects classification in some studies (Thiel et al., 2017;He et al., 2018;Zhao et al., 2018). In other words, images can not only provide spectral information, but also spatial information. In order to use this spatial information, in this study, a sample selection framework is presented to classify LiDAR points, by using the DIM points as the auxiliary data. The contributions of the framework are twofold.
(1) In the framework, a multi-class TrAdaboost algorithm is introduced to automatically select the samples from the DIM points based on the extracted features to improve the classification performance of the LiDAR points. Two widely used types of classification approaches, which are point-based classification, and super-voxel-based classification are used for testing the generation of the framework. The results show the framework can fuse DIM points with LiDAR points to improve the LiDAR points classification performance.
(2) The framework only selects the features of DIM points which are useful for LiDAR points classification, so registration of the DIM points and LiDAR points is not needed. The images containing the classes in LiDAR point clouds can be used. This decreases the data required during the fusion process as other studies did and can make full use of the existing aerial images.

METHOD
The feature-based dense matching method can generate point clouds from airborne images or UAV images. However, the generation of DIM points needs to match the image, and the lack of image texture leads to the lack of DIM points (Feng, 2014). Figure 1 shows an example of DIM points and LiDAR points. It can be seen that the two kinds of point clouds do not looks the same. The LiDAR points are more regular and have less data missing, compared to the DIM points. Also, there is a lack of data on the sides of buildings and the bottom of vegetation, as shown in the yellow boxes in Figure 1.
In order to effectively mine the useful information from the DIM points, a sample selection framework is proposed, and the flow chart is as shown in Figure 2. First, the features are individually extracted from the LiDAR points and DIM points. Then they are fed into the TrAdaboost algorithm which selects useful information from the DIM points to classify the LiDAR points.

Feature Extraction
This study focuses on highlighting the use of the DIM points, so only the geometric features are used and the spectrum of images is not considered here. Thus, first, the support region is delineated. Let q be a point in the point cloud. Let Nq = {p | p is one of the k closest points of q} be the support region of point q and in this study, k = 90. Let us also assume p to be the centroid of all points in Nq. Then the geometric features for q used in this paper are introduced as follows.
1) Height-based features: The normalized height (Niemeyer et al., 2013) is expressed using the difference between the DSM and DTM, which can eliminate the effect of topographic relief.
2) Eigenvalue-based features: The eigenvalues λ1, λ2, and λ3 (λ1 ≥ λ2 ≥ λ3) are obtained by finding the covariance (Cp) of Nq (Weinmann et al., 2014): where T is the transpose operator The ranges of the eigenvalues computed for Cp of different points are different. To compare these eigenvalues of different covariance matrices, the eigenvalues of each covariance matrix Based on solving eigenvalues, we can derive features to distinguish planes, edges, lines and more. That is, anisotropy , planarity , sphericity , and linearity . (3) 3) The normal vector: A plane can be fitted bases on the points in Nq as Equation (7). Then the normal direction (n) of   Table 2. Train set and test set in three tests the plane is taken as the normal vector of q, which can be expressed in three directions (nx,ny,nz) as: where d is the distance to the origin of the coordinates 4) Curvature, Roughness (Zhao et al., 2018): Curvature ( ) reflects the shape and plane characteristics of the point cloud, which can effectively distinguish building points and vegetation points. Roughness ( ) refers to the average value of the distances from all points in the support region to the local geometric plane.
In this paper, the required geometric feature variables are combined into a 13-dimensional feature vector Fv as Equation (8). (8)

The Multi-class TrAdaboost Algorithm
After the Fv of all the points in the DIM points and the LiDAR points are calculated, a classifier is used to classify the LiDAR points. Here, RF, which has been widely used and shows good performance for classification (Chehata et al., 2009) is used. However, RF cannot integrate the information in the DIM points into the training.
Thus the TrAdaboost algorithm (Dai et al., 2007), which can automatically select useful samples from the DIM points is employed. However, the classical TrAdaboost algorithm is usually only applicable to binary classification problems (Chen et al., 2019). Thus in this paper, the binary TrAdaboost is extended into a multi-class TrAdaboost. The details of the algorithm are shown as follows: 1) Initialize the weights (w) for all the samples to 1.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020   Table 3. precision/recall/F1score and accuracy of both using the LiDAR points and both LiDAR points and DIM points based on the point-based classification.
2) Normalize the weight of each sample and each sample here is a point.
In the formula, k represents the label of the sample xi. ht k (xi) is the probability of xi to be the label k calculated by ht.

5) Set
t  and  as the weights of ht for samples in LiDAR points and DIM points, respectively. To avoid overfitting, in iterations, t T is the total number of iterations, and T = 20 in this paper.
The output is the final probability of each class, and a sample is labeled by the class with the highest probability.
From step 4 and step 7, it can be found that our method can classify the data into multiple categories. From step 6, it can be seen that if some samples of the DIM points are misclassified, the weights of the samples will be smaller in the next iteration. After multiple iterations, the weights of the samples that cannot help the point cloud to improve accuracy will be smaller and smaller, and the samples that can help the LiDAR points to improve accuracy will be retained. Therefore, the multi-class TrAdaboost algorithm can be used to fuse the two kinds of point cloud.

Super-voxel-based classification
Super-voxel-based approaches have been proposed in many present studies (Akwensi et al., 2020) and show better results than point-based classification. To test the generation of our framework, the super-voxel approach is also been considered here. The constructions of super-voxel are based on the study (Kang et al., 2018b).
First, the bounding box for the whole point cloud is obtained. Then, the bounding box is divided into small voxels whose sizes are 0.4m×0.4m×0.4m. After the voxels have been generated, a super-voxel is generated by a three-dimensional simple linear iterative clustering algorithm. Thus first, all the voxels containing points are found and the points which are the closest to the centers of the voxels are used as seeds. The points whose distances to a seed are smaller than 0.8 m are seen as the neighboring points. If a point is the neighbor points of more than one seed, the point belongs to the closest seed. After all the points are assigned to the seeds, the seed position is updated. The point closest to the center of the neighbor points of a seed is the new seed. If the seeds are closer than 0.2m, the seeds are combined to one seed. After ten iterations, the point clouds are segmented to super-voxel.
After super-voxels have been done for the DIM points and the LiDAR points, the features of the super-voxel are extracted. The points in the super-voxel are used as the support region to calculate the Fv except for . We use the average normalized height of the points in the super-voxel as the height feature of the super-voxel. The corresponding Fv are the features of the super-voxel. Each feature of a super-voxel is a sample in the multi-class TrAdaboost algorithm. After the super-voxel of the LiDAR points have been classified, all the points in each supervoxel are given the same label with the super-voxel.

Dataset Description
The LiDAR points and DIM points used in this paper were obtained by the WHU Kylin Cloud-I system which was developed by Professor Yang Bisheng's team at Wuhan University. The system was equipped with a Velodyne 16-line laser scanner and a consumer video camera. The scene is the playground of Wuhan University. The datasets obtained for this classification task are as shown in Figure 3. The three datasets contained ground, buildings with different sizes, isolated trees, and clusters of vegetation. The detailed statistics of the two kinds of point clouds are as shown in Table 1.
In this paper, we divide the dataset into three groups for the experiment. Table 2 shows the train set and the test set we used for each experiment.

Results and Analysis
In order to evaluate the performance of the classification, four quantitative indexes are used, which are precision, recall, accuracy, and F1 score. The experiments were performed by point-based, and supervoxel-based classification approaches, and the results are as shown in Figure 4, and 5, respectively. The points on trees, buildings, and ground are colored in red, blue, and green respectively. The quantitative results are listed in Table 3, and 4. It can be found in Table 3, and 4, the performance values obtained by using both the LiDAR points and the DIM points is good. And the accuracy attained by using our method is all higher than 80%. Among the three different categories, the accuracy of grounds and trees is high. This is mainly because the geometric characteristics of the two classes are more obvious. However, the accuracy of points on buildings is low. The main error areas in the three datasets are the top part of buildings, as well as the edges of the building. It can be seen from Figure 4, and 5, that many of the building points are misclassified to be vegetation. The main reason is that the distributions of points in the two places are between clutter and regular planes, which are easily confused.
In comparison to the results of only LiDAR points, most of the F1 scores and accuracies of each class using both the LiDAR points and DIM points listed in Table 3, and 4 achieve better results. The main misclassified areas are highlighted by yellow boxes. The results show our framework can effectively fuse the two kinds of point clouds, and the DIM points are helpful in the LiDAR points classification. It was observed that, after the fusion process, the overall precision of the vegetation class increased but that of the building class decreased. This is because the laser can penetrate the vegetation and there are points under the canopies of trees which are non-vegetation. However, the DIM points only located on the canopies. The two distributions are quite different. The DIM points provide more complete surface morphological information for the LiDAR points, so they are good supplements. For buildings,   Table 5. Precision/recall/F1score and accuracy between different classifiers based on the supervoxel-based classification performance of the ground is significantly improved. It is because in the ground of scene III, the LiDAR points have a missing part due to building occlusion, and the DIM points make up for this part. Fusion with the kind of DIM points will improve the classification performance.
To further evaluate the performance of our proposed model, frequently-used classifiers (Adaboost (Wang et al., 2015), SVM (Ramiya et al., 2016)) were compared to the proposed model using the three datasets. Table 5 lists the comparison results of the three datasets and Table 6 indicates the processing time. Although the proposed method required more processing time than other classifiers, it achieved the highest overall accuracies in terms. Furthermore, compared with other classifiers, the proposed model has better precision in the classification of buildings.
For the fusion of optical images and point cloud, the two data sources need to be registered. However, in our frameworks, only samples are used, which means no registration is needed. A big advantage of no registration is that the DIM points in other regions can be used to improve the LiDAR points in the region. This decreases the requirement of images in the same area during the fusion process. To show the performance, DIM points in another region is used as auxiliary data. Figure 6 shows the DIM points of a village in Zhengding City, Hebei Province obtained by a UAV with the Nikon D300 camera. Buildings, vegetation, and grounds are also contained in the DIM points. As the super-voxel classification approach obtain the best results in the fore experiments, here only the approach is used. Table 4 shows the results. It can be seen that after fusion with the DIM points in different regions, the classification performances are still better than those using the LiDAR points alone. But lower than fusion with the DIM points in the same regions. This is because the environment is different, the DIM points in Hebei cannot be well assisted.
However, in scene I, the precision of vegetation is higher than using DIM points in the same regions. Because in scene III, the DIM points of vegetation have missing, and the vegetation points in Hebei are more complete, which can assist the classification of LiDAR points better. It can be found that, the better the auxiliary point clouds are, the better the classification performance achieved.

CONCLUSION
In this paper, we show that DIM points are helpful for the LiDAR points classification, which is another way of fusing point clouds and images. A sample selection framework is  proposed to fuse the DIM points. In the framework, the features of points, and super-voxels are extracted. Then the multi-class TraAdaboost algorithm was proposed to automatically select samples from the DIM points for improving the classification performance of the LiDAR points. The results show that after using the DIM points, the classification performance improved and the improvement is based on the quality of the DIM points. At the same time, the framework does not need to register the LiDAR points and DIM points. The DIM points of other regions can also help in the LiDAR points classification. This advantage can make full use of existing aerial images.
In the future, we will use our framework for classifying more objects, and study which categories can be improved by the DIM points. Also we will use more DIM points for classification to further study which DIM points can improve the LiDAR points.