EIGENENTROPY BASED CONVOLUTIONAL NEURAL NETWORK BASED ALS POINT CLOUDS CLASSIFICATION METHOD

The classification of point clouds is the first step in the extraction of various types of geo-information form point clouds. Recently the ISPRS WG II/4 provides a benchmark on 3D semantic labelling, a convolutional neural network based method achieves the best overall accuracy performance in all participants who only use the geometrical and waveform based features extracted from the ALS data. Features of the point are calculated in different scales to achieve the best performance. It is not efficiency for the future use. In this paper, we use an eigenentropy based scale selection strategy to improve this method. The scale selection strategy improves the average F1 score and makes the classification method more simple and efficient.


INTRODUCTION
The classification of 3D point clouds has caused great attention in computer vision, remote sensing, and photogrammetry.In recent decades, airborne laser scanning (ALS) has become important in acquiring 3D point clouds.ALS point clouds allow an automated analysis of large areas in terms of assigning a (semantic) class label to each point of the considered 3D point clouds (Blomley et al., 2016;Chehata et al., 2009;Mallet et al., 2011;Niemeyer et al., 2014;Shapovalov et al., 2010).However, the relatively low point density, the irregular point distribution and the complexity of observed scenes make the accuracy of the classification result hard to improve.In our previous work (Yang et al., 2017), we presented a convolutional neural network based method to handle this problem.Although result achieves a good performance, it still remains some question.In order to find the best scale to calculate the features of the point clouds, a lot of repetitive experiments are needed.It is resource intensive and time consuming.A proper way to find the best scale should be used solving this problem.
Many presented approaches have focused on finding an optimal neighbourhood size for each individual 3D point and thus increasing the distinctiveness of derived features.For densely sampled and continuous surfaces, Mitra and Nguyen (Mitra et al., 2004) find the optimal size based on iterative schemes relating to curvature, point density and noise of normal estimation.For more cluttered surface representations, Mark (Mark et al., 2010) uses the surface variation based size selection method, Demantké (Demantké et al., 2011) uses the dimensionality based size selection method and Weinmann (Weinmann et al., 2015) uses the eigenentropy based size selection method.
In this paper, we use the eigentropy based method to select the best scale.The task of finding a suitable neighbourhood may be correspond to minimizing a eigenvalue based measure of unpredictability given by the Shannon entropy (Weinmann et al., 2015).Although the optimal neighbourhood size selection method may cause additional computational effort, it reduce the repetitive experiments and has a significantly positive impact on 3D scene analysis.In order to verify the algorithm capability, we compare eigenentropy based result to our previous result on the ISPRS WG II/4 benchmark dataset.
The rest of this paper is organized into four Sections.Section 2 introduces our methodology.We present the information of the benchmark dataset and the experiment result in Section 3 and the concluding remarks and suggestions for future work are in Section 4.

Feature Extraction
For each point in the ALS point clouds, we can get four types of features which are intensity, eigenvalue-based features, normal vector based feature and height above DTM (Yang et al., 2017).

Intensity:
Except for the X,Y and Z coordinates, the LiDAR system can also collect the intensity of the returns, which is a measure of the amount of energy reflected back to the sensor (García et al., 2010).
In the ISRPS benchmark dataset both impervious surfaces and low vegetation are flat objects, it is hard to distinguish them only by the geometric features.The intensity values are high on the impervious surfaces and low on the low vegetation that can make these object easy to distinguish.

Eigenvalue-Based
Features: these features are calculated from the normalized eigenvalues  i which are extracted from the point covariance function.The planarity and sphericity are as follows, where 2.1.3Normal Vector Based Feature: These features may help us identify the planar objects such as roofs and roads.The local plane is estimated via a robust M-estimator (Xu and Zhang, 1996) in our experiment.The normal vector can be derived from the local plane and we use the variance of normal vector angle from vertical direction to discriminate planar surface from vegetation.

Height Above DTM:
The DTM we use is generated by the commercial software package SCOP++ which uses the robust filtering theory (Kraus and Pfeifer, 1998).Based on the analysis by Niemeryer (Niemeyer et al., 2014), the height above DTM represent the global distribution for a point which helps a lot in the classification work and this feature may be the most important one.It is strongest and best discerning feature for all classes and relations.For instance, to distinguish between a relation of points on a roof or on a road level this feature has a great influence.

Neighborhood Selection
To obtain an appropriate local neighbourhood as the basis of feature extraction, we focus on the use of an eigenentropy based scale selection method.
The linearity  L , planarity  P and sphericity  S can be derived from the eigenvalues which represent the 1 dimensional, 2 dimensional and 3 dimensional features.The three features sum up to 1, thus conform to the two of three probability axioms.The work of finding a suitable neighbourhood scale can be changed to find the most important dimensionality which corresponds to minimizing a measure of unpredictability given by the Shannon entropy (Shannon, 1948)

ln(L ) -P ln(P ) -S ln(S )
(3) The eigenvalues can directly be exploited in order to estimate the order/disorder of 3D points within the local 3D neighbourhood (Weinmann et al., 2015).So we normalize the three eigenvalues by the sum In our experiments, the neighborhood is defined as a sphere in a radii of r.In order to find the best scale, the interval of   min max ,  r r r has been sampled into 20 different scales.The optimal neighborhood size relate to the radii which minimizes the eigenentropy can then be determined.

Image Generation:
A feature image generation method is used to convert the 3D point clouds to 2D images.The features we use is same as (Yang et al., 2017).
For each point in the ALS point clouds, a square window is set up and divided into 128*128 cells.For each cell centre, we find the nearest point from the point clouds.The best scale and the features are calculated based on this point, and these features are transferred into three integers within 0 to 255 as follow: where  P is the planarity,  S is the sphericity, 2  z is the variance of normal vector angle from vertical direction and above H is height above DTM.These features are normalized between 0 and 1. Intensity denotes the echo intensity values normalized between 0 and 255.
Mapping these three integers to the cell with red, blue and green colors.Then the 128*128 cells square window will be transferred into a 128*128 RGB image.

Convolutional Neural Networks.
The architecture of the CNN model we use is shown in Figure 1.The CNN model is implemented with Caffe deep learning netwroks (Jia et al., 2014).As shown in Figure1 the conv denotes the convolutional layers and the pool denotes the pooling layers.From the original input image, hierarchical features can be extracted by these layers.FC denotes the fully connected layers followed by the convolutional and pooling layers to do the classification task.The input data is usually followed by the convolutional layers.
Features can be detected regardless of their position by these convolutional layers.A pointwise nonlinear activation operation   g  is usually performed subsequently.A pooling layer is followed by the activation operation which can select the dominant features.Pooling layers can improve the robustness of translation and reduce the number of network parameters.The whole process is shown as： where k H denotes the k th output data, k W denotes the k th convolutional operation kernels and k b denotes the k th bias values.Rectified linear units are used as our nonlinear activation operation followed as：

Dataset
The ISPRS WG II/4 benchmark test provides the ALS data from Vaihigen used for the 3D reconstruction and 3D labelling challenge.We use this benchmark dataset to evaluate our method.The dataset is acquired with Leica ALS50 system over Vaihigen, a small village in Germany (Cramer, 2010).The mean flying height is 500m and the field of view is 45 degrees.The point density of the dataset is around 8 points/m 2 .The points have not been rasterized or post processed.In total 9 classes have been defined and each point in the training dataset is labelled accordingly.The reference labels are provided by the authors of (Niemeyer et al., 2014).
As shown in Figure 2, the training area contains 753876 labeled points with the XYZ coordinates, intensity values, number of returns and reference labels.9 classes are defined which are Power-line, Low Vegetation, Impervious Surfaces, Car, Fence/Hedge, Roof, Façade, Shrub, Tree.The testing area is shown in Figure 3, which contains 411722 unlabeled points with XYZ coordinated, intensity values and the number of returns.

Accuracy Evaluation
For each class, we compute precision and recall value as follows: where tp denotes the true positive value, fp denotes false positive value and fn denotes false negative values.F1 score for each class is also calculated as: The overall accuracy and the average F1 scores are used to evaluate the performance of our experiment result.

Experiments & Discussion
As for convolutional neural networks parameters, the batch size is 128, the base learning rate is 0.01, the momentum is 0.9 and the weight decay is 0.0005.As for the feature image generation parameters, the width of cell is 0.05m and the neighbourhood size is calculated with eigenentropy scale selection method.The classification result is shown in Figure 4. We compare our experiment result with Yang's (Yang et al., 2017), four different neighborhood result as shown in  For the neighbourhood scale selection, we choose four different radii from 0.5m to 2.0m comparing with the eigenentropy method.In general, the overall accuracy has the highest value when the radius is 1m and the average F1 score has the highest value at the eigenentropy based optimal scale.More specifically, since the point density of each category varies a lot, the highest accuracy for each class may at different neighbourhood scale.
The eignentropy based scale selection method solve the problem.Although the accuracy of each class may not have the highest value, the overall accuracy ranks 2 nd in all experiment with 0.1% difference to the first place and the average F1 score ranks the 1 st in all experiment.It seems that the eigenentropy based method may need a large amount of computation but if we choose this method we don't have to waste time find the optimal neighbourhood size, thus the time costed in the feature generation process will be saved.

CONCLUSION
In total, our work reveal that the eigenentropy based scale selection method do have some positive influence on the point based feature image generation thus improve the average F1 score of the CNN based semantic labeling task.In the future work, we will take more features such as echo features, point density into consideration to improve the overall accuracy and try to find a way further improve the algorithm efficiency.

Figure 1 .
Figure 1.The architecture of the used deep CNN

Figure 2 .
Figure 2. Training set: the color encoding shows the assigned semantic class labels

Figure 4 .
Figure 4.The classification result of the eigenentropy based method, the color encoding shows the assigned semantic class labels.

Table 1 .
The recall value, the overall accuracy of each class and average F1 value for different neighborhood scale in[%].Bold numbers show the highest values within the different scale.