INTEGRATING INSAR INFORMATION AND SPATIAL-TEMPORAL FACTORS IN MACHINE LEARNING ANALYSIS FOR LANDSLIDE PREDICTION – A CASE STUDY FOR PROVINCIAL HIGHWAY 18 AREA IN TAIWAN

Taiwan is located in subtropical monsoon area and Pacific Ring of Fire. Both the rate of crustal uplift and annual rainfall are among the highest in the world. Earthquakes and heavy rainfall have led to massive landslides and debris flow. Frequent disasters and the high rate of surface erosion have caused drastic changes in river topography and catchment areas, and, consequently, have impacted the safety of human lives. To mitigate the losses, better simulation and prediction of landslides are critical. Existing landslide prediction research works employed terrain, geology, rainfall, earthquakes and human activities as landslide triggering factors in the predicting model. In addition to aforementioned environmental conditions, this study would like to explore the use of SAR differential interferometry (InSAR) information to help observe characteristics of the slope movement behavior, which is also an important factor. Factors are analyzed and quantified on the basis of slope units. To confirm the applicability of selected factors to landslide, factors are firstly analyzed with Spearman correlation, and then those with higher correlations are incorporated into the prediction model. Machine learning based techniques are then employed to establish the prediction model. The experiment result demonstrates that InSAR information can improve the accuracy by more than 5% in landslide prediction.


INTRODUCTION
Slopes in Taiwan mountainous area are mostly geologically sensitive and fragmented, so when heavy rainfall or earthquake occur, it easily leads to landslides. The causes of landslide can be divided into two categories: potential conditions and triggering factors. The potential conditions are the natural conditions of slopes, namely topography, lithology, geological structure, and vegetation coverage; while the triggering factors are those including disasters, such as rainfall, earthquakes, and human activities. Existing studies mostly employ the factors from the above two categories into mechanic and statistical models to calculate landslide risk. However, due to the scarcity of data points and remote location, it is difficult to obtain large-scale surface displacement information. To solve this problem, this research explores the use of time series phase values measured by SAR interferometric techniques (InSAR) to obtain large-scale ground deformation information, i.e. the slope movement characteristics. Through correlation analysis and machine learning techniques, we further investigate the impact of spatial-temporal factors and InSAR information, and improve the accuracy and reliability of the landslide prediction research.

STUDT AREA
The study area is selected the area along Provincial Highway 18, Chiayi, the central part in Taiwan ( Figure 1). The streams and highways in here are all east-west. The study area is located in Fanlu Township and Alishan Township, Chiayi County. Covering an area of about 398.268 square kilometers. The main stream is the Bashang River. The terrain here is relatively rugged, as long as it encounters heavy rainfall events such as typhoons or plum rains, landslides usually occur.

Slope Unit
To preserve the complete topographic patterns, slope unit method is selected as the basic unit for analysis of various spatial-temporal factors. The parameters obtained by the slope unit are used to represent the entire unit, and it has the continuous meaning of a slope, which is a suitable method for landslide potential research. The slope unit is divided by the mountain ridge line and the watershed. Slope units processing is referred to the method proposed by Xie et al. (2004). The principle is to first find out and divide the digital elevation model (DEM) into several catchment areas. Second is to invert the DEM to get the reverse stream and mountain ridge information, and then divide the reverse DEM into several catchment areas again to obtain the reverse catchment areas. Third is to combine the positive and negative catchment areas into a single file to cut out the slope unit ( Fig  2). Since this result sometimes has unreasonable units, manual editing is still needed to obtain slope units with a more homogeneous terrain information. In this study area, there are 3312 slope units, and each slope unit is range in 15 to 30 hectares.

Spatial-Temporal Factors
Common spatial-temporal factors can be categorized into three parts according to their characteristics, i.e. geomorphology, location and geology. Geomorphic factors include elevation, slope angle, aspect, topographic roughness, curvature and vegetation coverage (NDVI); location factors represent the influence of the distance of factors that disturb the slope unit, including river distance, road distance and fault distance; geological factors represent the influence caused by regional geological conditions, such as sensitive area index, fold and downslope index; Triggering factors are factors which can trigger landslides, such as rainfall, earthquake and deforestation. In this study, monthly accumulated rainfall information has been applied.

Spearman Correlation Coefficient
The main purpose of correlation analysis is to understand the statistical dependence between two variables. To discover the applicability of the selected factors in this study, a correlation analysis of these factors is applied to understand the correlation between each factor and landslide. Considering that the distribution of each factors is different, and may not be able to fit the strict assumptions of the statistical method, such as the normal distribution, this study adopts Spearman Correlation Coefficient, which is a nonparametric measure of rank correlation. The data are only required to be converted into accordance with ascending or descending order, which can be applied to the description of the linear relationship of variables. The definition of Spearman Correlation Coefficient is as follows: where rs = Spearman Correlation Coefficient di = the difference between the two ranks n = the number of observations

InSAR Information Collection
This study collects the InSAR information from Sentinel-1, ESA, for five years, from 2016 to 2020. The deformation information from InSAR has been calculated in time interval: 4 months. InSAR technique is to obtain SAR images by using the repeated satellite orbits in the same area at different times, and obtains surface deformation information from the phase value. To explore the rationality of the ground displacement from InSAR, the displacement velocity will be converted from the satellite line of sight (VLOS) to slope (Vslope) direction. After converting the line of sight to slope, there are 600 Vslope points in the study area, which means only 18% slope units have InSAR information. According to Aslan et al. (2020), the speed of deformation displacement in the direction of line of sight of SAR satellite can be converted into the direction of the slope. Figure 3 is the schematic of the conversion between Vslope and VLOS, and the equation of conversion is as follows: Where the coefficient C represents the proportion of the real surface displacement velocity (Vslope) along the slope when projected to the LOS direction, which can be calculated by the geometric relationship between line of sight direction and slope units. The equation of the coefficient C is as follows: where A = slope direction of the slope unit S = slope angle of the slope unit α = incident angle of line of sight direction γ = azimuth angle of line of sight direction

Machine Learning Prediction
Machine learning techniques are commonly applied in establishment of landslide prediction model. In this study, Random Forest algorism has been applied for landslide prediction. Random forest is a classifier that contains multiple decision trees, and add random selection of training data to greatly improve the final calculation result. The equation of Random Forest algorism is as follows: where P = probability c = category v = node l = number of categorise t = number of decision trees gc = the average probability for category c For classification, the output of Random Forest is the class selected by most trees. Random Forests correct for decision trees' habit of overfitting to their training set. This study uses 14 spatial-temporal factors as import data, of which 5 years from 2016 to 2020 are used as training samples and then predict in March 2020 and September 2020 whether each slope unit will occur landslides. Finally, calculate and compare the prediction accuracy between entire slope units and slope units with InSAR information.

Landslide Ground Truth Map
The landslide inventory announced by Forestry Bureau is once a year, and the latest landslide inventory was announced in 2017. The time scale in years does not meet the needs in this study, thus, SPOT images have been downloaded for making the landslide ground truth map in this study. The SPOT image has been supervised classification to preliminarily classify into categories such as bare soil, vegetation, water body, and cloud cover. Since the spectral characteristics of river channel and settlements are similar to landslides, they were classified into the bare soil category in the preliminary classification, however, river channel and settlements are not landslides. Thus, manual editing is carried out to separate the preliminary category bare soil into real landslide areas, river channels, settlements, and vegetation regeneration areas.

RESULTS AND DISCUSSION
The spatial-temporal factors in this study are all quantified and analyzed based on slope units and show in Figure 4 From the significance test (Table 1), the significant factors related to the landslide in the study area are slope, topographic roughness, curvature, distance to river, distance to fault, downslope index, fold and sensitive area index. The results of correlation analysis demonstrate that the correlation between factors and landslide are mostly weak to modest, which means the impact of one single factor is not obvious to landslide.
There are 3312 slope units in the study area, of which 14 spatial-temporal factors are used as the import data in Random Forest landslide prediction model. The prediction result (Table 2) shows that in March 2020, the overall accuracy is 88.1%; in September 2020 forecast, the overall accuracy is 88.9%. For 600 slope units with Vslope points (about 18% of the total units), the overall accuracy for landslide prediction in March 2020 is 94.2%; the overall accuracy for landslide prediction in September 2020 is 94.8%. Both are significantly higher than the accuracy of the entire slope unit by more than 5%. In the study area, 213 landslide units (about 6% of the total units) were increased in March 2020, and 4 landslide units (about 0.1% of the total units) were increased in September 2020. For increased landslides prediction (Table 2), the overall accuracy in March 2020 and in September 2020 can reach 97.2% and 100% respectively. The results demonstrate that the machine learning model has enough capability to predict increased landslides.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France Note: The lower the level of significance level, the better the factor significance, and the level of significance < 0.05 means significant.

CONCLUSION
In this study, the area along Provincial Highway 18, Chiayi was chosen to be the study area. The significant factors related to the landslide are slope, topographic roughness, curvature, distance to river, distance to fault, downslope index, fold and sensitive area index. The weak to modest correlation results demonstrate that the impact of one single factor is not obvious to landslide. For landslide prediction, Random Forest algorithm is performed by applying the 5year (2016-2020) spatial-temporal factors to the prediction of landslides in March and September 2020, and the accuracy can be higher than 88%. Research results demonstrate that after applying InSAR information, the capability of the prediction model has been improved, which is more than 5% higher than slope units without InSAR points. For predicting newly increased landslides, the prediction accuracy can even exceed 95%. In this study, the analysis and prediction method of slope displacement is proposed to realize its feasibility by integrating common spatial factors (geomorphology, location and geology) and InSAR temporal observation data.