LANDSLIDES IDENTIFICATION USING AIRBORNE LASER SCANNING DATA DERIVED TOPOGRAPHIC TERRAIN ATTRIBUTES AND SUPPORT VECTOR MACHINE CLASSIFICATION

Since the availability of high-resolution Airborne Laser Scanning (ALS) data, substantial progress in geomorphological research, especially in landslide analysis, has been carried out. First and second order derivatives of Digital Terrain Model (DTM) have become a popular and powerful tool in landslide inventory mapping. Nevertheless, an automatic landslide mapping based on sophisticated classifiers including Support Vector Machine (SVM), Artificial Neural Network or Random Forests is often computationally time consuming. The objective of this research is to deeply explore topographic information provided by ALS data and overcome computational time limitation. For this reason, an extended set of topographic features and the Principal Component Analysis (PCA) were used to reduce redundant information. The proposed novel approach was tested on a susceptible area affected by more than 50 landslides located on Rożnów Lake in Carpathian Mountains, Poland. The initial seven PCA components with 90% of the total variability in the original topographic attributes were used for SVM classification. Comparing results with landslide inventory map, the average user’s accuracy (UA), producer’s accuracy (PA), and overall accuracy (OA) were calculated for two models according to the classification results. Thereby, for the PCA-feature-reduced model UA, PA, and OA were found to be 72%, 76%, and 72%, respectively. Similarly, UA, PA, and OA in the non-reduced original topographic model, was 74%, 77% and 74%, respectively. Using the initial seven PCA components instead of the twenty original topographic attributes does not significantly change identification accuracy but reduce computational time.


INTRODUCTION
Landslides are natural hazard causing significant damages to the environment in many countries.Landslides can be fatal and can also destroy or damage natural landforms.Moreover, landslides have disruptive impact on man-made structures such as buildings, agricultural and forestial lands and on water in rivers and streams (Akgun and Erkan, 2016;Schuster and Fleming, 1986).Because of the increasing socio-awareness of landslide impact on the environment, efficient landslide assessment is required (Aleotti and Chowdhury, 1999).Therefore, many countries created or are creating their own national or regional landslide databases (LDBs).This is a fundamental source for quantitative zoning of landslide susceptible areas (Van Den Eeckhaut and Hervás, 2012).
Currently, landslide mapping methods involve field inventories, which are time consuming.Alternative landslide mapping methods applied photogrammetric approaches and analysis of Digital Elevation Model (DEM).Here airborne laser scanning (ALS) achieved big popularity by providing high resolution topographic information overwhelming over traditional surveying techniques (Tarolli, 2014).Nowadays, applying ALS data in landslide mapping seems to be a standard tool.Nevertheless, many studies beside the ALS data use expert knowledge, experience and very often familiarity with the study sites.Unfortunately, very few researchers made attempt to automate the process of landslide mapping using computer-aided methods (Van Den Eeckhaut et al., 2012).In automatic approaches, DEM and first and second-order derivatives of DEM such as: slope, aspect, curvature, topographic roughness among other derivatives are very beneficial in the landslide recognition.Very promising results are obtained based on surface roughness investigation.
McKean and Roering (2004) applied local variability of unit vector orientations of slope and aspect.Their results indicate that contrast analysis of surface roughness can be used to identify bedrock landslides, outline their spatial extent and even examine landslide internal kinematics.Furthermore, impressive results were obtained by Booth et al. (2009) using spectral analysis and high-resolution topographic data.They compared the results with independent landslide inventory maps and correctly classified an average of 82% of the terrain in five study areas in Washington and Portland Hills, Oregon.Chen et al. (2014) used DEM-derived features and random forests algorithm for semi-automatic landslide mapping.They achieved overall accuracy of 78.24%.
Another approach using object-oriented image analysis (OOA) was proposed by Van Den Eeckhaut et al. (2012).The results obtained show that OOA using DEM-derivatives allows them to recognize more than 90% of the main scarps and 70% of the landslide bodies.Recently, another approach was presented by Leshchinsky et al. (2015).They proposed a novel algorithm for the automatic and consistent landslide deposits mapping.The authors applied the Contour Connection Method (CCM) and a high agreement with manual delineated landslides deposit was achieved.
Applying many DEM-derivatives and sophisticated classifiers including Support Vector Machine (SVM), Artificial Neural Network or Random Forests are often computationally time consuming.Moreover, high resolution of the DEM data and their derivatives for large areas are often big data sets, which require powerful processing platforms to handle them.
Therefore, the objective of this research is to deeply explore topographic information provided by the ALS data and to decrease the computational time in semi-automatic and computer-aided landslide mapping.For this reason, an extended set of topographic features and the Principal Component Analysis (PCA) were used to reduce redundant information and to accelerate the classification process.

DEM AND ITS DERIVATIVES
The ALS data was obtained in the framework of the ISOK project (Pawłuszek et al., 2014).Point cloud with resolution of 4-6 points/m 2 and overall accuracy less than 15 cm was used.According to Pawłuszek et al. (2014)

GENERAL SETTINGS OF THE STUDY AREA
The study area is located in the central part of the Outer West Carpathians in Poland and it approximately covers an area of 2.8 km 2 (figure 2).The geographical location of this area is 49 • 44'N to 49 • 45'N latitude and 20 • 40E to 20 • 43E longitude.The altitude of the study area ranges from 267.48m to 477.77m.Maximum slope angle in the study area is 58.63 • .From the geological point of view, the study area is situated within the Ciężkowice Foothills, close to the bank of Rożnów Lake (Starkel, 1972).
According to the hydrological data, precipitation occurs frequently in the form of rain and snow throughout the winter.The annual mean precipitation of this area over the period of 1981-2010 is 800 mm (Woźniak et al., 2013).The main reason of the landslide occurrence within the study area is sedimentary rocks and rainfalls.Moreover, landslide activity is mostly associated to the abundant rainfall, fluctuation of water level in the Rożnów Lake and the flysch type of rocks (Borkowski et al., 2011).According to figure 2 three diverse land uses can be observed: forestial, agricultural and urban areas.As can be seen, landslides mainly occur in forested areas and cropland.Therefore, it is worth to emphasize usefulness of ALS in landslide mapping within study area, where traditional field inventories are challenging.

Principal Component Analysis
PCA is well known method to reduce redundant information between highly correlated variables.It is widely used in hyper-spectral analysis, where bands are highly correlated with each other.The PCA allows reducing the elements necessary to describe large number of inter-correlated variables (Abdi and Williams, 2010).Many classification method from machine learning theory are time-consuming, therefore using full data set of DEM-derivatives for big study areas is ineffective.For this reason the PCA was performed for the DEM and the 19 DEM-derivatives.The normalization of the DEM-layers was required before performing the PCA in order to overcome influance one layer over the others.The initial seven PCA components provide 93.2% of information contained in 20 DEM-layers. Figure 4 shows RGB composition of the three initial PCA components.Discrimination of classes in a new feature space can be separable by means of decision hyperplane (figure 5).Our implementation of SVM classification was made in ENVI using four degree of kernel polynomial function, bias term equal to three and kernel bandwidth γ = p −1 , where p is the number of DEM-derivatives (Hsu et al., 2003).the landslide inventory map, the average user's accuracy (UA), the producer's accuracy (PA) and the overall accuracy (OA) were calculated for two models.Thereby, for the PCA-feature-reduced model the UA, the PA and the OA were found to be 72%, 76%, and 72%, respectively.Similarly, UA, PA, and OA in the non-reduced original topographic model, was 74%, 77% and 74%, respectively.Table 2 and table 3 present detailed accuracy assessment as a confusion matrix.

Computational time processing
The SVM classifications were performed using ENVI 32bit, version 5.2.The computations were performed on two Intel (R) Xeon (R) E5649 CPUs 2.53 GHz with 48 GB DDR3 RAM memory.According to the time required for the calculation, user time was measured.The user time is the amount of the CPU time spent in user-mode code within the process.This is only actual CPU time used in executing the process.Other processes and time the process spends blocked do not count towards this figure.According to the results, the SVM classification for the PCA-reduced model and full dataset with the 20-DEM layers took 32 and 65 minutes of the user time, respectively.The computational time of the PCA is not crucial, because using ArcGIS software the computation took 4s.

SUMMARY AND CONCLUSIONS
The objective of this research was to deeply explore topographic information provided by the ALS data and to overcome computational time limitation.The PCA was used to reduce redundant information in an extended set of topographic features.The PCA was used also to decrease computational time in semi-automatic and computer-aided landslide mapping.The proposed novel approach was tested on the susceptible area affected by more than 50 landslides located on Rożnów Lake in the Carpathian Mountains, Poland.
Based on accuracy parameters presented in table 2 and table 3, seven initial PCA components with 90% of the total variability in the original topographic attributes were used for the SVM classification.
Comparing results with landslide inventory map, the average user's accuracy (UA), the producer's accuracy (PA) and the overall accuracy (OA) were calculated for two models.Thereby, the PCA-feature-reduced model the UA, the PA, and the OA were found to be 72%, 76%, and 72%, respectively.Similarly, the UA, the PA and the OA in the non-reduced original topographic model, was 74%, 77% and 74%, respectively.
Using the initial seven PCA components instead of all 20 original topographic attributes do not significantly change identification accuracy but reduce the computational time.The tests were performed on the relatively small study area (2.8 km 2 ) and the classification for the seven PCA components took 32 minutes while for the non-reduced model 65 minutes.
Authors will continue research in further studies on this topic taking into account an extended test sites.Nevertheless, based on the results achieved in this study, the proposed procedure, which combines the DEM-derivatives and the SVM algorithm, can effectively identify landslide areas in the region of the Carpathian Mountains.
Moreover, applying PCA for the DEM-derivatives effectively decrease the computational time in the semi-automatic landslide mapping presented in this study.
the height component accuracy of the ISOK data does not exceed 23 cm for forested areas.Based on the DEM, 19 DEM-derivative layers were prepared.Because of the large number of the DEM-derivatives, only multiple shaded relief derivative is presented in figure 1. Main information, calculation patterns and references for other derivatives are collected in table1.

Figure 1 :
Figure 1: Multiple shaded relief and the existing landslides (pink areas) Information and references DEM [ArcGIS TM ] Van Westen et al. (2008) slope [Spatial Analyst in ArcGIS TM ] Van Westen et al. (2008) standard deviation of shaded relief moving standard deviation filter using 3 x 3 pixel kernel [Spatial Analyst in ArcGIS TM ] openness difference between original DEM and DEM ki , where DEM ki is interpolated DEM with 9 x 9 moving average kernel Van Den Eeckhaut et al. (2012) [Raster Calculator in ArcGIS TM ] topographic roughness GIS Geomorphometry & Gradient Metrics toolbox by Evans et al. (2015) contour density 20cm contour density per circle with radius of 3m [Python in ArcGIS TM ] area solar radiation (ASR) ASR represents the solar energy for a given pixel and specific date [Spatial analyst toolbox in ArcGIS TM ] morphological gradient represents difference between the dilation and the erosion of the DEMimage [Python in ArcGIS TM ] topographic position index (TPI) Land Facet Corridor Designer by Jenness et al. (2013) skewness represents the asymmetry of the probability distribution, moving skewwness index filter wit 3x3 pixel kernel [Python in ArcGIS TM ] curvature [Spatial analyst toolbox in ArcGIS TM ] (Van Den Eeckhaut et al.kernel [ArcGIS TM ] multiple shaded relief [Spatial analyst toolbox in ArcGIS TM ] (Eeckhaut et al., 2007) figure 2

Figure 2 :
Figure 2: Location of the study area with the existing landslides and ortoimage

Figure 4 :
Figure 4: RGB composition of the initial three PCA component with borders (black line) of the existing landslides

Figure 6
Figure 6 and figure 7 present classification results for the PCA-reduced model and the non-reduced model with the landslide inventory map, respectively.Comparing results with

Figure 7 :
Figure 7: SVM classification results (green areas) for all DEMderivtives and the existing landslides (hatched polygons)

Table 2 :
Confusion Matrix of SVM classification using the seven PCA components

Table 3 :
Confusion Matrix of SVM classification using full data set