LAND COVER CLASSIFICATION FROM FULL-WAVEFORM LIDAR DATA BASED ON SUPPORT VECTOR MACHINES

In this study, a land cover classification method based on multi-class Support Vector Machines (SVM) is presented to predict the types of land cover in Miyun area. The obtained backscattered full-waveforms were processed following a workflow of waveform pre-processing, waveform decomposition and feature extraction. The extracted features, which consist of distance, intensity, Full Width at Half Maximum (FWHM) and back scattering cross-section, were corrected and used as attributes for training data to generate the SVM prediction model. The SVM prediction model was applied to predict the types of land cover in Miyun area as ground, trees, buildings and farmland. The classification results of these four types of land covers were obtained based on the ground truth information according to the CCD image data of Miyun area. It showed that the proposed classification algorithm achieved an overall classification accuracy of 90.63%. In order to better explain the SVM classification results, the classification results of SVM method were compared with that of Artificial Neural Networks (ANNs) method and it showed that SVM method could achieve better classification results. * Corresponding author


INTRODUCTION
In the last decade, Light Detection And Ranging (LiDAR) has become an important source for acquisition of the 3D information of targets.It has been widely applied in many fields of remote sensing, such as, environment monitoring, disaster assessment, land cover classification.According to land cover classification, the traditional LiDAR usually uses 3D coordinates information of targets since it can only records several echoes and obtain limited information about the targets.Compared to the traditional LiDAR system, full-waveform LiDAR systems can record the entire backscattered waveform of the targets.The waveform features reflecting the properties of targets can be retrieved from the waveforms and are now extensively used for a large variety of land cover classification (Mallet and Bretar, 2009;Heinzel and Koch, 2011).This paper aims to study the land cover classification using full-waveform LiDAR data.Some scholars have studied land cover classification based on full-waveform features.In 2008, Straub et al. presented a processing procedure for automated delineation and classification of forest and non-forest vegetation which was solely using full waveform laser scanner data as input.An overall accuracy of 97.73% was reached.However, only forest and non-forest vegetation were classified (Straub et al., 2008).In 2008, Reitberger et al. described an unsupervised species classification method based on features that were derived by waveform decomposition of full waveform LiDAR data.The classification grouped the data into two clusters (deciduous, coniferous), which leaded to an overall accuracy of 80 % in a leaf-on situation.The presented results clearly showed the potential of full waveform data for the comprehensive analysis of tree structures (Reitberger et al., 2008).In 2010, various statistical waveforms parameters, such as standard deviation, skewness, kurtosis and amplitude were used as inputs to an unsupervised classification method, Kohonen's Self-Organizing Map (SOM), to separate vegetation (trees and grass) and nonvegetation (pavement and roof) surfaces.However, there was no quantitative evaluation of the classification results (Zaletnyik et al., 2010).These studies were based on the unsupervised classification methods, but the supervised classifiers were preferred since they offer a higher flexibility.In remote sensing field, Support vector machines (SVM), which is a supervised classifier, has been used for classification under different applications, multispectral measurements, DEM Generation from Aerial LiDAR Data, Synthetic Aperture Radar (SAR) images.Thus it has played a major role in classification problems (Yang and Lunetta, 2012).Some scholars have investigated the potential of full-waveform data for land cover classification using the SVM classification method.In 2009, Bretar showed that LiDAR amplitude and width contained enough discriminative information on bad lands to be classified in land, road, rock and vegetation.A 3-D land cover classification was performed by using a SVM classifier.However, the classification accuracy was only 79.1% when the amplitude, width and Digital Terrain Model (DTM) information were combined (Bretar et al., 2009).In 2011, Mallet et al. used a SVM classifier to label the point cloud according to various scenarios based on the rank of the features.The results showed that echo amplitude, cross section and backscatter coefficient significantly contributed to the high classification accuracies (around 95%).However, only three land cover types (building, ground and vegetation) were classified.And adding redundant features in a same set prevented from concluding on the contribution of each feature (Mallet et al., 2011).In 2015, Tseng et al. combined LiDAR waveform data, orthoimage data and the spatial features of waveform data with SVM to classify the land cover point clouds.However, only by using fused waveform and orthoimage information, the highest overall accuracy could be achieved in land cover point clouds classification (Tseng et al., 2015).In this paper, a classification method based on multi-class SVM using full waveform features, i.e. distance, intensity, FWHM and back scattering cross section was presented to predict the types of land cover in Miyun area as ground, trees, buildings and farmland, and the method was compared to ANNs.The remainder of this paper was organized as follows: In section 2, waveform processing methodology, including waveform decomposition, features extraction was introduced.Then multiclass SVM classifier theory was presented.In section 3, the workflow of SVM classification method was introduced.Four land cover types in Miyun area were classified based on SVM and the results were compared to the ANNs in this part.The conclusions were given in section 4.

Waveform processing methodology
Full-waveform LiDAR system records the entire backscattered waveform signal from targets, which is actually a sum of partial scattering response signals convolved with the scanner's system waveform.Thus it not only provides 3D point clouds, but also obtains abundant information of the targets.In the workflow of processing full waveform data, waveform decomposition is the most important step.

Waveform decomposition
The waveform decomposition includes these parts: preprocessing of waveform data, waveform decomposition, and components detection.Before waveform decomposition, the noise of the waveforms needs to be removed.The widely used filtering methods include Wiener filter and Gaussian smoothing.However, the Wiener filter is very sensitive to noise (Jutzi and Stilla, 2006).For the Gaussian smoothing, it is difficult to select an appropriate kernel width for each echo pulse reflected from the complex terrain.By analysing the characteristics of the waveform intensity, Median Absolute Deviation (MAD) method was used for waveform filtering and had great effect on original waveform (Persson and Mallet, 2005).Figure 1 shows the raw waveform of an echo and the waveform filtered by MAD.It can be seen that MAD method has certain smooth effect on the raw echo waveform.Since the transmitted laser pulse is modulated as Gaussian pulse, and the scattering of laser pulse for most targets can be approximated by a Gaussian reflection, so the backscattered waveform component can be modeled as a Gaussian function.Indeed, most waveforms can be very similar to an ideal Gaussian function whereas other laser impulse responses are slightly asymmetric.Consequently, it may not be an accurate representation that using a sum of Gaussians to approximate the waveforms which depends on the targets.Therefore, generalized Gaussian function was used for waveform modeling in this paper which could better represent the backscattered patterns from different targets.In this way, fitting of asymmetric, peaked or flattened echoes located both in different areas could be improved (Chauve et al., 2007).

Waveform features extraction
The waveform features can be determined through the component parameters.In this paper, the extracted waveform features include distance, intensity, FWHM and back scattering cross section.The distance indicates the distance from laser transmitter to the target, which is determined by estimating the positin of the waveform component.Ideally the peak position is considered as component position and the time lag is used to calculate the distance (Mallet and Bretar, 2009).Intensity is a combination of emitted energy, distance, atmosphere attenuation and reflective capability of illuminated targets.In practice, the echo amplitude is most commonly regarded as intensity (Wagner et al., 2008).The FWHM denotes the extension of waveform in the incident direction, which is shown in Figure 2. It is closely related to the geometry of targets, terrain slope and targets material (Wagner et al., 2006).The backscattering cross section delineates the backscattering ability of the targets and is a comprehensive indicator of distance, intensity and FWHM (Wagner, 2010).Some factors, such as angle of incidence, atmospheric, range, surface characteristics, etc., have influence on the waveform features.Therefore, these features can hardly be used without radiometric calibration (Lehnera and Briesea, 2010).To reduce such influence and further improve the effectiveness of waveform features for land cover classification, this work has made a comprehensive correction over the extracted waveform features.The detailed methodology was introduced in published article (Zhou et al., 2015).

Multi-class SVM Classifier
SVM is a supervised classifier.For supervised classification algorithm, classification usually involves separating data into two sets that are training and testing sets, respectively.Every instance in the training set comprises one "target value" (i.e. the class labels) and several "attributes" (i.e. the features or observed variables).In this paper, the attributes are the extracted waveform features including distance, intensity, FWHM of the waveform and back scattering cross section, as described in Section 2.1.2.Based on the training data, the goal of SVM is to produce a model that predicts the target values (class labels) of the test data given only the test data attributes.In our experiments, the model was used to discriminate the four classes of interest: buildings, trees, farmland and ground.For SVM based binary-class classification, given a training set of instance-label pairs ( i x , i y ), i=1, 2…l, where (2) (3) Here training vector i x are mapped into a higher dimensional space by the function  .SVM searches a linear separating hyperplane in the higher dimensional space.C>0 can be regarded as the penalty parameter of the error term.Due to the possible high dimensionality of the vector variable w , usually we solve the following dual problem: Where is a vector of all ones, Q is a l by   .There are two parameters C and γ for SVM using RBF kernel.In order to find the best C and γ for a given problem, parameter searching is required.A grid-search using cross validation is applied here.In x-fold cross-validation, the training set is divided into x subsets which have the same size.Sequentially x-1 data subset is used to train the model which can test the remaining one data subsets.Various pairs of (C, γ) values growing exponentially (grid-search) are tried and the one with the best cross-validation accuracy is selected to be the model.SVMs are designed to solve binary problems.When having n ≥3 classes of interest, various approaches are possible to address the problem, usually combining a set of binary classifiers.In this paper, we use the "one-against-one" approach, in which classification a voting strategy, is used to determine the multi-classes: For each instance, k (k-1)/2 binary classifiers are invoked (k: number of classes), each classifier votes for one class, and the final label is taken to be the class with most votes (Hsu and Lin, 2002).In case that several classes have identical votes, though it was not a good strategy, we simply select the one with the smallest index.

Experiment data
The captured data of Miyun area, in Beijing, was used in this paper.The full-waveform LiDAR data was acquired by the LiteMapper 5600 airborne LiDAR system and CCD images were acquired by DigiCAM-H/22 Hasselblad.The experimental area of Miyun data set was about 14 km 2 , the flying height was about 700m, the average density of the point clouds was 4points/m 2 , and typical land covers were buildings, trees, farmland, ground, etc.In this paper, a piece of experimental area was selected to study the SVM classification using extracted waveform features.The size of the selected area was about 330m x 390m, containing about 338174 points.The CCD image of the selected area was shown in Figure 3 (a).

Experiment procedure
The flow chart of the experiment is shown in Figure 4. Firstly, the returned waveforms were filtered by MAD as mentioned above.Then waveforms were decomposed using the enhanced component detection algorithm.Features including distance, intensity, FWHM and back scattering cross section were extracted and corrected.These features would be the attributes of instance for denoting the waveform reflected from a type of land cover.Secondly, multi-class SVM model was generated to classify the land cover types.Then the received waveforms reflected from typical land cover were divided into ground, trees, buildings and farmland.In this paper, 1000 features vectors for each typical land cover type were selected as the training data to generate SVM model according to the CCD image of the experiment region.Based on the SVM procedure mentioned in section 2.2, the selected data were trained for ten times, and the SVM model with highest cross-validation accuracy would be selected as the model to predict the land cover types of Miyun area.Finally, the pseudo color classification image depicting the values of land cover types of Miyun area was generated and the results were evaluated.

Experiment results
The classification result of Miyun area using corrected features based on the SVM method is given in Figure 5.The corresponding CCD image was shown in Figure 3 (a).The brown area, the yellow area, the red area and the green area respectively represent farmland, ground, building and trees.It can be seen that by using full-waveform features we can effectively distinguish different types of land cover.5248 instances of these four land cover types, except the training data, were selected to calculate the classification accuracy, as shown in Table 1.The ground truth information was acquired manually according to CCD image data of the experiment region.The confusion matrix for the classification results of these four land cover types of Miyun area was obtained and shown in Table 1.
The overall classification accuracy using corrected features reached 90.63% and the classification Kappa was 0.8741.In this paper, SVM classification method was used.In order to better interpret the SVM classification results, nonlinear Artificial Neural Networks (ANNs) was also applied to classify the land cover types in Miyun area, and the classification results of these two methods were compared.ANNs imitate the brain"s model of an interconnected system of neurons, enabling computers to detect patterns and to learn complex relationships within data (Anderson, 1995).Usually, ANNs basically provide a "black box" model.ANNs used in this paper consisted of a single hidden layer and was trained for 500 cycles by back propagation with a learning rate of 0.2.The classification result is shown in Figure 6, the brown area, the yellow area, the red area and the green area respectively represent farmland, ground, buildings and trees.The confusion matrix for the classification results was obtained and shown in

Analysis
The overall classification accuracy using SVM method was 90.63%, while it was 87.69% by using ANNs.It can be seen that SVM classification method can indeed produce higher accuracy.From Figure 5 and Figure 6, we can see that some area on the left side of the Figure was "ground" in fact; however it was classified to be "farmland" by ANNs, as the black ellipse shows.Some areas on the right side of the Figure were "ground" in fact; however it was also classified to be "farmland" by ANNs, as the blue ellipses show.Additionally, the most confusion in prediction by SVM method was between "building" and "tree", "farmland" and "ground", as shown in the fourth row and the second column, and the third row and the fifth column in Table 1.This was possibly resulted from the similar distance of "building" and "tree", "farmland" and "ground".Prediction errors were also generated from "tree" and "farmland", as shown in the fifth row and the fourth column in Table 1, which was because "tree" and "farmland" had similar properties.

CONCLUSIONS
In this paper, the returned waveforms were filtered by MAD and waveform decomposition was implemented using the enhanced component detection algorithm.Then waveform features including distance, intensity, FWHM and back scattering cross section were extracted and corrected.The classification ability of corrected features was also clearly analysed.Multi-class SVM model was generated to classify the types of land cover in Miyun area as ground, trees, buildings and farmland.Classification results showed that the classification accuracy reached 90.63% and the classification Kappa was 0.8741.Furthermore, the SVM classification was compared to classical ANNs, and it showed that SVM method could achieve better classification results.
In future work, the further improvement in land cover classification may be achieved by using more waveform features.The weight of the features will be studied.
Figure 1.Raw waveform and the filtered waveform

Figure 2 .
Figure 2. The diagram of intensity and FWHM


which are binary indicators of the instances and become support vectors.Also, in the dual formulation, an explicit knowledge of the function  is not necessary and the kernel K may be applied instead, which is not possible in the primal problem.Here the kernel function used

Figure 4 .
Figure 4. Flow chart for land cover classification of Miyun area based on full-waveform LiDAR data

Figure 5 .
Figure 5. Land cover classification results of Miyun area based on SVM method

Table 1 .
Confusion matrix of the classification results based on SVM method

Table 2 .
The overall classification accuracy reached 87.69% and the classification Kappa was 0.8349.Figure 6.Land cover classification results of Miyun area based on ANNs method

Table 2 .
Confusion matrix of the classification results based on ANNs method