Pixel Based Landslide Identification Using Landsat 8 and GEE

Landslide is one of the common natural disasters, triggered mainly due to heavy rainfall, cloud burst, earthquake, unorganized constructions, and deforestation. In India, field surveying is the standard method to identify potential landslide regions and update landslide inventories, but it is costly and inefficient. Alternatively, advanced remote sensing technologies allow rapid and easy data acquisitions and help to improve the traditional method of landslide detection capabilities. For example, machine learning algorithms, Support Vector Machine (SVM), challenge conventional techniques by predicting disasters with reasonable accuracy. In this research work, we have utilized open-source datasets (Landsat 8 images and JAXA ALOS DSM) and Google Earth Engine (GEE) to identify landslide regions in Rudraprayag using machine learning techniques. Labeled landslide locations are obtained from landslide inventory (by Geological Survey of India). The landslide identification has been performed using SVM, Classification and Regression Trees (CART), Minimum Distance, Random forest (RF), and Naïve Bayes techniques, in which SVM and RF outperformed all other techniques by achieving an 87.5% true positive rate (TPR).


INTRODUCTION
Natural disasters are increasing year by year due to the effect of global climate change and rapid human settlement. Landslide, the hazardous geological event, it is the downslope movement of rock mass and debris. It can happen due to multiple reasons like Heavy rainfall, cloud burst, earthquake, improper human settlement, or unorganized constructions [4]. A landslide can alter the natural surroundings causing a change of land cover. However, the most adverse effect is the loss of lives and livelihood in the region, and sometimes blockage of the roads causing a delay in transportation and emergency medical services.
Field surveying is the standard method to identify and update landslide inventories. However, it is very costly, time-consuming, inefficient, and ineffective, which delays the update of landslide inventories by half-decade or more. Previously, the landslide inventory of our study area (Rudraprayag) was updated in 2016. Using advanced remote sensing techniques, Landslides can be identified in an improved way if satellite imageries train machine learning models (supervised machine learning).
In supervised machine learning, a model is trained using the classified (or labeled) datasets. Initially, the whole dataset is divided into two categories, training data, and validation data. Training data is used to train the machine learning model for a particular purpose or to identify the data class. Further, the validation data is used to check the model's performance, deciding whether the model performs well. If the model is not performing well, the model is again trained by changing parameters.
There have been a number of attempts to identify landslides using multiple approaches [2] [8]. [5] Used a convolution neural network (CNN) and used multiple models such as VHH-13, ResNet-50, ResNet-101, Inception-v3, and DenseNet. [7] conducted a study to detect landslides using SVM, maximum likelihood (ML), and back-propagation neural network (BPNN), and superiority of the RF method is observed in comparison to other techniques. [9] exploited five machine learning algorithms(Logistic regression, Support vector machine, Random forest, Discrete Adaboost, LogitBoost, Gentle Adaboost) and deep learning methods (CNN-6 and DCNN-11) on landslide databases (recent, relict, and joint) and evaluated their robustness and potential in identification of landslides. We have also observed that most papers declaring high accuracy in landslide identification have used high spatial resolution imagery [5] [9].
Research communities are already dedicated to landslide identification in various regions, but the potential of open-source data is not very well explored in the Indian subcontinent. In most research studies of landslide identification, high-resolution satellite imagery (resolution better than 5m /pixel) is utilized. The use of open-source satellite imagery such as Landsat 8 available on the web with a low resolution of 30m/pixel is very limited.
In this proposed paper, we have explored and compared multiple supervised machine learning algorithms to evaluate the potential of open-source multi-band satellite imagery to identify landslides.

STUDY AREA AND DATASET
Study area ranges from 30 0 10'36 "N to 30 0 48'50"N in latitude and 78 0 48'46"E to 79 0 21'45"E in longitude, covering the Rudraprayag and its neighboring districts of Uttarakhand state in India ( Figure 1). As a result of heavy rainfall (higher than average) during the monsoon season, many landslides are observed, mainly on the Mandakini and Alaknanda banks.

Figure 1. Google Earth Image of Study Area
The datasets used in this paper are JAXA ALOS World 3D -30m (AW3D30) and Landsat 8 multi-band satellite images of the study area. AW3D30 is a global digital surface model (DSM) dataset with a horizontal resolution of approximately 30 meters (1 arcsec mesh). Using DSM, we calculated the surface slope as shown in Figure 2. Landsat 8 images of the interval April 2015 -October 2015 have been utilized to accomplish the research objectives. Therefore, we calculated different indices such as normalized difference vegetation index (NDVI) and normalized difference water index (NDWI) to be used as parameters in the Machine learning Algorithm (Figure 3). NDVI is the mathematical ratio used to identify dense vegetation canopy, snowfields, and bare land and is shown in equation (1). NDWI is one of the indices related to water content computed by formulae shown in equation (2). Specifically, this time interval was chosen because the data in the inventory are updated to the same duration, and during this time, a large portion of the landmass is exposed after melting of the snow. We have divided the whole dataset into the 75% and 25% ratio for training and validation purposes, respectively.   To prepare the landslides dataset, a few cares have been taken. First, in the historical landslide inventory data shown in Figure 4, the Landslide was initially located over satellite images. Second, while delineating the landslide boundary, special care was taken not to include an exterior land cover. Furthermore, few landslides which cannot be recognized from the satellite imagery were removed as they can only be identified from field investigation, and if we use them, they could confuse models.

Google Earth Engine (GEE)
Earth Engine consists of a multi-petabyte analysis-ready data catalog with a high-performance, intrinsically parallel computing service. GEE has a catalog of publicly available satellite imagery and spatial datasets, including observations from various satellites in optical and non-optical wavelengths, environmental variables, weather, climate forecasts, land cover, topographic and socio-economic datasets.

Advanced Machine learning algorithms
Multiple machine learning algorithms use different approaches for the classifications. SVM finds a hyper-plane that creates a boundary between classes SVM along with linear classification; SVMs can efficiently perform a nonlinear classification using kernel tricks and kernel functions [3]. Classification and Regression Trees (CART) models, also referred to as "decision trees," are obtained by partitioning the data space and fitting a simple prediction model within each partition, which is done recursively [6]. Naïve Bayes uses the probabilistic approach, and it predicts based on the probability of an object. It depends on the principle of Baye's theorem, and minimum distance uses a close distance approach and classifies unknown data by minimizing the distance between the data and the class in multi-feature space. Random forest is the tree-based method by combining many decision trees to yield a single consensus prediction [1].
Initially, landslide inventory and Landsat-8 imagery, and ALOS DSM were used to create data for training and validation for the machine learning models. The distribution percentage of the training and the validation distribution made 75-25 percent, respectively. The proposed methodology is shown in Figure 5.

RESULTS AND DISCUSSION
For the classification process, we divided the land cover of the study area into five categories: forest, water, snow, bare land, and Landslide, and we marked the data for each of the five categories for training and validation.
We used multiple algorithms for the classification process like decision tree, naive Bayes, SVM, random forest, and minimum distance. Every algorithm has different methods based on the dataset's behaviors and the parameters (NDVI, NDWI, DEM, and slope) provided to train data.  Table 1. From Figure 6, we can see a high frequency of landslides is observed near the river, which is confirmed from the landslide inventory.
To visualize the result, two Landslides, one from Mansoona (Ukhimath district) and another in Banadhar, are shown in Figure 7 and Figure 8   There are three limitations that we identified during the research work. First, is the unidentified dumping zones of the construction sites, also identified as landslides. Second, rejecting landslides pixels that do not look like landslides, and last is the limitations of the spectral resolution, limiting the performance of the other potential indices.

CONCLUSIONS
This study demonstrated the significant potential of the BHUVAN landslide inventory, Landsat 8, and GEE. The Machine learning models, SVM and RF, achieved 87.5% TPR, indicating good accuracy of the model. We can achieve better results by overcoming the mentioned limitations. This paper shows the applicability of machine learning methods in classification problems. For the future, we can use deep learning methods to take our study to the next level for better understanding and to get output performance from the classification model.