USING FULL WAVEFORM DATA IN URBAN AREAS

In this paper, the use of waveform data in urban areas is studied. Full waveform is generally used in non-urban areas, where it can provide better vertical structure description of vegetation compared to discrete return systems. However, waveform could be potentially useful for classification in urban areas, where classification methods can be extended to include parameters derived from waveform analysis. Besides common properties, also sensed by multi-echo systems (intensity, number of returns), the shape of the waveform also depends on physical properties of the reflecting surface, such as material, angle of incidence, etc. The main goal of this investigation is to identify relevant parameters, derived from waveform that are related to surface material or object class. This paper uses two waveform parameterization approaches: Gaussian shape fitting and discrete wavelet transformation. The two classification methods tested are: supervised Bayes classification and unsupervised Self-Organizing Map (SOM) classification. The results of these methods were compared to each other and to manual classification. The initial conclusion is that, though waveform data contains classification information, the waveform shape by itself is not enough to perform classification in urban regions, and, consequently, it should be combined with the point cloud geometry.


INTRODUCTION
Most modern LiDAR system have the capability to acquire full waveform LiDAR data besides the discrete returns with intensity data.Waveform data is quite useful to distinguish tree species or provide better biomass description.The question is whether waveform can also provide useful information for object classification in urban regions.Since waveform parameters, such as the commonly used intensity, highly depend on the properties of the surface, classification can be potentially performed on them.First, the properties of the waveforms which are typical for features should be identified.In this study, two parameterizations are used: Gaussian shape fitting and discrete wavelet transformation.The methods with special extensions are described in Section 2. An important part of the research is to find typical class-specific parameters that are independent from the commonly used parameters, such as intensity or number of returns.Next, classification performance should be evaluated based on actual LiDAR data.In this investigation, two classification methods were tested: the naive Bayes supervised classification method (Green, 1995) and the unsupervised Self-Organizing Map (SOM) (Kohonen, 1990) , detailed in Section 3. The four combinations of parameter extraction and classification algorithms were tested on a LiDAR dataset acquired by an Optech ALTM 3100 sensor over an area near Dayton, Ohio, USA, shown in Figure 1.The selected area represents a typical suburban environment, including a mix of vegetated areas and man-made objects.Four object classes were defined: grass, tree, roof and pavement.All the four methods were compared to each other with respect to classification performance, described in Section 6.Finally, a validation with manually classified points was performed.All the data processing and analyses were carried out in the GNU Octave open source software environment.

WAVEFORM PARAMETERS
The classification methods generally require discrete wellstructured input values.Since the waveforms are really different for each reflection, they cannot be directly used as input for a classification procedure.Ideally, waveforms have to be described by parameters without any loss of the materialspecific information.There are many ways to model waveform (Duong et al., 2006)s; in this study, one typical procedure and a new method were tested.Note, that for the analysis only the return signal was used (the shape of the outgoing pulse was not considered).

Generalized Gaussian Fitting
For the purpose of determining the shape-specific waveform parameters to be used in the classification, a two-step peak detection and peak parameter extraction method were used.In the first step the number of peaks is determined by inspecting the second derivative (the curvature) of a cubic smoothing spline fitted to the waveform data.In the second step generalized Gaussian functions are fitted to the waveform data (number of the fitted functions depends on the number of peaks detected in the previous step) using the Levenberg-Marquardt algorithm (Chauve et al., 2007).The generalized Gaussian function used here can adjust to the translation, magnitude, pulse width, flattening and skewness of the waveform, see Figure 2. The translation represents only the location of pulse in the LiDAR data, therefore, not used for classification.In addition to the four shape parameters, two more parameters were selected.The motivation was though they describe the waveform in a reasonable way, yet the feature specific properties do not appear really dominant outer the magnitude parameter, as seen in Figure 3.The magnitude describes the intensity value gathered by traditional scanners, that"s why not so interested in this investigation.The additional parameters are expected to provide additional information for the classification, as they rely more on the waveform shape.The "penetration" parameter is calculated, as the number of discrete samples over a previously defined threshold.In our case the threshold was chosen to be 33.The waveforms returned from a pavement have typically lower intensity value than one from penetration.Also, this parameter better separates the vegetation (grass and tree) from the pavement and roof, as shown in Figure 4.

Discrete Wavelet Transformation
The waveform signal can also be transformed by Discrete Wavelet Transformation (DWT), resulting in a well compressed and structured dataset.Since the waveform has local correlation, the higher order DWT coefficients can be usually discarded.The two-pulse waveform example in Figure 6 shows that the first 18 wavelet coefficients are sufficient to preserve the waveform, and can be potentially used for classification.
The CDF 3/9 wavelet transformation provides a good representation of the waveform with good compression performance (Laky et al., 2010).In our investigation, the WaveLab toolbox was used (Buckheit and Donoho, 1995).

Self-Organizing Map
Automatic classification can be performed by Kohonen"s Self-Organizing Map algorithm (SOM) (Kohonen, 1990).SOM is an unsupervised method and has very flexible parameterization with good performance in handling non-linear mapping problems (Zaletnyik et al., 2010).Our implementation used the SOM_PAK , (Kohonen et al., 1996).

Bayes Classifier
The second classifier tested was a naive Bayes classifier.Using a training set, the relative frequencies of the parameters falling into specified intervals for each class are calculated; i.e., the continuous parameters are discretized by binning, and then the empirical histograms of the parameters for each class are calculated.The relative frequency of the categories occurring among the training waveforms is also calculated.The class for a specific waveform is then selected to be the class that maximizes the probability where C is the class, F i are the classification parameters, is the probability of a given classification parameter to be in a given bin for a given class.

CLASSIFICATION TESTS
To perform comparative performance evaluation, the introduced algorithms were tested using a LiDAR dataset, acquired over Dayton, Ohio.Note that all classification methods had a post processing step with mode filtering to avoid class speckle.

Fitting and SOM
The algorithm of this method is based on the fitting parameters, especially on the pulse width, flattening, skewness, fitting error and penetration.SOM classification with a rectangle topology and 2x2 dimensions was applied on the parameter set.The parameter calculation assumes that only single peak echoes are processed; note that the multi-echoes were added during the post processing to the tree class.Furthermore, the range differences were calculated and local high points were classified as roof.This step improves the separation of roof and pavement; however, this means that not only the waveform is used for classification.The crucial area is the sidewalk and the grass belt along the street.Fitting and SOM classifies this area as a pavement, so the pavement area is larger than in reality (Figure 7).

Fitting and Bayes classification
The waveforms with one peak are processed by the shape fitting algorithm.In addition to the parameters introduced in the previous section, the range differences were also used to improve the separation of roof and pavement.Multi-echoes are classified as trees and local high points as a roof, similar to the method in section 4.1.This algorithm made some differences on the sidewalk; however this area gets defined in the wrong class (Figure 8).The runtime of Bayes classification is about the same as the SOM.
Figure 8: Fitting and Bayes

Wavelet and SOM
The benefit of wavelet is that single-echo and multi-echo waveforms can be processed in the same classification step.The first 18 wavelet coefficients were used as the input to SOM.The is a rectangle and 2x2 dimensions were used as well as in section 4.1.Figure 7 and Figure 9 show that SOM doesn"t recognize the sidewalk and grass strip.The separation of pavement and roofs give less reliable results.In the roof area, there are both pavement and roof points.The easiest way to improve this classification is to use range differences or to examine the height distribution in the area.

Wavelet and Bayes classification
In this case the classifier was also applied to the first 18 wavelet coefficients.The training set for Bayes classification has had 427 points about the same distribution as the final classes.The sidewalk has some usable information (Figure 10).The experiences suggest that substantial differences exist on the SOM and Bayes classifiers in this area and there are no significant differences at other regions.This result is very similar to the "Fitting and Bayes classifier" method of Section 4.2.This shows that both sets of input parameters have the same information included and the classifier has higher impact.The other reason is the impact of non waveform based classification components.

Methods summary
Table 1 shows a summary of the used input parameters in the four methods.Table 1: Used parameters for methods

Improving classifications
The classification based solely on waveform parameters seems to give insufficient results.The method seems to be good for separating pavement and grass; however there are some difficulties differentiating between pavement and roof.The source of this difficulty is the different incidence angle on the roofs.This is why the range difference was used in all methods for post processing.
Figure 11: Change in the classification result caused by using range differences and adding multi-echo waveforms

Result Differences
To get comparative performance of the four procedures, the results were compared to each other.There were 15617 points in the selected area, and for the Bayesian classifiers, 427 training points were used.The distribution of classes was the following: 53% grass, 26% tree, 7% roof, 14% pavement, Table 2;typical for a sparsely populated suburban area.3).A major source of differences is the side of the street, where grass and a sidewalk are present (Figure 12).The crucial points are in the class of grass ( 711) and in the pavement (453).This suggests that the problematic area is around the sidewalk.The other two classes have less than 20 problematic points.

Validation
The validation is based on a manually classified dataset.The results were compared to an area that includes 910 points.The results of four methods are very similar with and without the additional parameters.As described above, the difficult part of the classification is the area around the sidewalk.In this area, the two classification methods have some differences; SOM classifies the whole area as pavement (even with different input parameters), while Bayes classification results in incorrect classification and it affects the ratio of point number per classes.
In the view of numeric results, the SOM with fitting has the best performance with 95% of correctly classified points, and the wavelets with Bayes classification gave the worst with 91%.

CONCLUSION
The goal of the investigation was to find out if classification based solely on waveform data is feasible in urban areas.For this purpose different methods were tested and also special parameters were added.All four methods gave reasonable results but the common used parameters (intensity, range differences, number of returns) can"t be overlooked.The pavement and roof can be separated well by all of the methods.For trees, however, the multi-echo based classification is needed and the proper separation of roof and pavement requires range difference (or height difference) calculation.There were no significant differences between the generalized Gaussian fitting and wavelet transformation derived parameters in terms of classification performance.SOM and Bayes classifiers showed significant differences in the sidewalk areas.
In summary, waveform data can be used for classification purposes over an urban region, but it does not always provide a consistent performance.The incidence angle has high impact on the shape of the waveform signal, and, as in the case of roofs, this depends on the slope direction of the roof and the actual flight direction and scan angle.In this case, the classification based on waveforms has lower accuracy.The wide range of roofing materials has also a negative effect on the accuracy of the classification.

Figure 2 :
Figure 2: Four parameters of the generalized Gaussian curve

Figure 3 :
Figure 3: Magnitude, width, flattening, and skewness parameters in the test area

Figure 4 :
Figure 4: Penetration parameter in the test area The classification based only on the three parameters has difficulty in separating roofs and pavement.The source of the problem is likely the different surface normals for roofs.The second additional parameter describes the residuals of the Gaussian fitting.The standard deviation of the fitting error is typical for the selected four main classes, see Figure 5.

Figure 5 :
Figure 5: Fitting error in the test area

Figure 12 :
Figure 12: Number of classes assigned to each point (blue: same class for all methods, green: two different classes)

Table 2 :
Number of points in classesThe four methods produced quite similar number of points in all classes.The next inquiry is made for comparing the results point by point on each method.

Table 3 :
Number of classes on point base 88% of points resulted in the same class by all methods, the other 12% of them had two different classes and 10 points had three different classes (Table

Table 4 :
Validation results