AERIAL PHOTOGRAMMETRY AND MACHINE LEARNING BASED REGIONAL LANDSLIDE SUSCEPTIBILITY ASSESSMENT FOR AN EARTHQUAKE PRONE AREA IN TURKEY

Landslide is a frequently observed natural phenomenon and a geohazard with destructive effects on economies, society and the environment. Production of up-to-date landslide susceptibility (LS) maps is an essential process for landslide hazard mitigation. Obtaining up-to-date and accurate data for the production of LS maps is also important and this task can be achieved by using aerial photogrammetric techniques, which can produce geospatial data with high resolution. The produced geospatial datasets can be integrated in data-driven methods for obtaining accurate LS maps. In the present study, LS map was produced by using data-driven machine learning (ML) methods, i.e. random forest (RF). An earthquake and landslide prone area from the south-eastern part of Turkey was selected as the study area. Topographical derivatives were extracted from digital surface models (DSMs) produced by using aerial photogrammetric datasets with 30 cm ground sampling distances. The lithological parameters were employed in the study together with an accurate landslide inventory, which were also delineated by using the high-resolution DSMs and orthophotos. The relationships between the landslide occurrence and the pre-defined conditioning factors were analyzed using the frequency ratio (FR) method. The results show that the RF method exhibits high prediction performance in the study area with an area under curve (AUC) value of 0.92. * Corresponding author


INTRODUCTION
Geological hazards are natural phenomena that may cause physical, economic and social losses; and threaten the environment and human lives. There is a significant increase in the number of studies on natural hazards using various geospatial data sources and resolutions in recent years. Landslides are among the most common and destructive natural hazards in many parts of the world. Turkey is also highly affected with an occurrence statistic of 23,041 landslides between 1950and 2018(AFAD, 2021. Landslide susceptibility (LS) maps are extremely important for disaster mitigation and prevention activities, and spatial planning in hazard-prone areas. The number of LS mapping studies conducted in recent years have also increased in the literature. For this purpose, various statistical and machine learning (ML) methods were proposed by researchers, such as analytical hierarchy process (AHP) (Pourghasemi et al., 2012), frequency ratio (FR) (Yi et al., 2019), decision trees (DT) (Wang et al., 2016), random forest (RF) (Karakas et al., 2020), logistic regression (LR) and artificial neural networks (ANN) (Sevgen et al., 2019), etc. Currently, the main research questions in LS mapping are the generalization capability of the supervised ML methods and the availability of accurate and upto-date data for the model training in such approaches.
The main aim of this study was to produce accurate LS map of an area, which remains in the Malatya and Elazig Provinces of Turkey, and is prone to multiple geohazards, i.e. earthquake and landslides. In the area, often earthquakes trigger new landslide events (Gokceoglu et al., 2020;Karakas et al., 2021a). High resolution aerial photogrammetric images were employed in the study to produce the digital surface models (DSMs) and orthophotos. Since some parts of the study area are relatively difficult to access, a landslide inventory was derived from the high resolution DSMs and orthophotos manually by experts (Karakas et al., 2021a). Karakas et al. (2021 b) carried out comparative analyses of different ML algorithms for LS map production in the area and the model generalization capabilities by using different training samples; and found out that the RF outperformed the other one. Therefore, the prediction performance of the RF was assessed here by utilizing the spatial data samples from the whole study area as training and test. In addition, the relationships between the landslide inventory and the conditioning factors were analyzed statistically using the FR method. The reliability and the predictive power of the RF model was evaluated using the area under the receiver operating characteristic (ROC) curve.

MATERIALS AND METHODS
The overall methodological workflow of the study together with the input datasets are provided in Figure 1. Here, the input datasets include aerial photogrammetric flight datasets and the geological features (i.e. lithological units) of the area. Further topographic features were derived from the DSMs to obtain the other possible conditioning factors for landslides. Two main methods, FR and RF employed to analyze the input features and produce the LS map. Further details on the area, the datasets and the methods are presented in the following sub-sections.

Study Area and Aerial Photogrammetric Flight Data
An area from the south-eastern part of Turkey inside Malatya and Elazig Provinces was selected for the study. The aerial photogrammetric flights were performed by the General Directorate of Mapping (GDM), Turkey, in two different years for Malatya (2017) and Elazig (2018) Provinces. Thus, the DSMs and orthophotos were produced separately for the two parts, namely Malatya and Elazig. The area covers ca. 270 km 2 and 218 km 2 for Malatya and Elazig parts, respectively. Figure  2 shows the location of the study area with the landslide inventory, which were observable in the DSMs and orthophotos. The landslide activity types denoted in the Figure represent inactive (type 1) and active (type 2) mass movements. The minimum and the maximum landslide areas are 267 m 2 and 18 x 10 5 m 2 , respectively. The red rectangles denote the subareas selected for visual assessment. Although a destructive earthquake occurred on 24 Jan 2020 (Mw of 6.8) and triggered several landslides in the region, the landslides triggered by this event was observed in other datasets after the earthquake (Gokceoglu et al., 2020;Karakas et al. 2021a 142 aerial stereo images with 30 cm ground resolution were provided by the GDM for the study. The photos were taken with 80% forward and 60% lateral overlaps using UltraCam Eagle large format digital cameras. The cameras have 20010 x 13080 pixels with 5 microns detector size and 100.5 mm focal length. The interior orientation parameters (IOPs -i.e. camera calibration data) and the adjusted the exterior orientation parameters (EOPs) were also obtained from the GDM. The DSMs with 5 m grid spacing and the orthophotos with 2 m resolution were produced using Agisoft Metashape Professional software, Agisoft LLC, St. Petersburg, Russia.  (1) and active (2) mass movements.

Geological and Topographical Characteristics
The study area is located in a region with high seismicity and active tectonism in the East Anatolian Fault Zone (EAFZ). The geological and geomorphological characteristics of the region exhibit a young and steep topography with lithological units having weak shear strength features (Sevgen et al., 2019). The lithology map of the study area is given in Figure 3 with their symbols; such as Alluvium (Qal); Unconsolidated Gravel, Sand, Silt, Clay (Qçk); Neritic Limestone (Eo1-2); Maden Complex (Tem); Magmatic Rocks (m1); Puturge Metamorphites (PzMzp); and Marble (PzMzpmr). A total of seven conditioning factors, such as slope gradient, slope aspect, altitude, plan and profile curvatures, lithology, topographic wetness index (TWI), and stream power index (SPI) were considered in the LS evaluations. The topographic derivatives were produced from the DSMs; and the lithology data were digitized from 1:100,000 scale geological maps published by Akbas et al. (2016). In the selection of these conditioning factors, their frequent use in the literature was considered (e.g. see Gokceoglu and Ercanoglu, 2001;Brenning, 2005;Nefeslioglu et al., 2012). In Figure 4(a-g), the topographic derivatives are presented for the two sub-regions marked with red rectangles in Figure 2 selected from Malatya and Elazig parts for increasing the visual interpretability of each parameter. The landslide inventory with a total of 247 landslides ( Figure 2) were used for the model training.

Frequency Ratio Method for Feature Analysis
The FR is one of the statistical analysis methods frequently used in the production of LS maps (Yi et al., 2019;Nefeslioglu et al., 2012;Silalahi et al., 2019). It is also used for the quantitative evaluation of the LS levels in an area based on the observed spatial relationship between the landslide locations and individual conditioning parameters. The method aims at determining the density of input features that are effective in the landslide occurrence. The densities are computed by overlapping the landslide inventory and the feature maps. The number of landslide occurrence pixels in each parameter is evaluated based on the inventory, and the FR values for the given ranges are calculated by dividing the percentage of landslide occurrence ratio in the feature by the areal ratio of the feature in the whole site. Equation 1 denotes the FR calculation formula.
where NLi: number of pixels with landslide in feature i (a) Altitude The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition) NLt: total number of pixels in landslide inventory NCi: total number of pixels in feature i NCt: total number of pixels in the study area

Machine Learning Method for LS Mapping
In this study, the RF method was performed to produce LS maps due to high prediction performances (Chen et al., 2017;Kim et al., 2018;Adnan et al., 2020;Sevgen et al., 2019;Karakas et al., 2020). The RF is an ensemble learning method that aims to increase the classification value by generating more than one DT during the classification process. The DTs created individually are aggregated to form a decision forest. Here, the DTs are a subset of randomly selected trees from the data set to which it is connected. This algorithm was developed by Breiman (2001). There are two important parameters in the RF, which are the number of trees and the maximum depth of the tree. Here, these values were chosen as 128 and 16, respectively.
The Python scikit-learn library (Scikit-learn, 2021) implementation was used for performing the algorithm. All landslide polygons shown in Figure 2 were used to train the model for landslide samples; and the non-landslide samples were selected randomly from the areas outside the inventory.

FR Results
The FR analysis results of the conditioning factors (features) are provided in Tables A1, A2, (Tables A2  and A4). Figure 5 shows the produced LS maps for Malatya and Elazig parts. The ROC curves for both parts are presented in Figure 6.

RF Results
The AUC values are equal to 0.90 and 0.92 in Malatya and Elazig parts (Figure 6a and 6b), respectively. The results were evaluated in five classes as very low, low, moderate, high and very high as obtained from the Jenks classification; and their statistical summary is given in Table 1. As can be seen in Table  1, the areas with high and very high susceptibility values within Malatya part were 32.76 km 2 and 44.51 km 2 , respectively. In Elazig part, these values were obtained as 28.43 km 2 and 21.30 km 2 . Figure 5. LSMs of the study area obtained with RF method.

CONCLUSIONS AND FUTURE WORK
In the present study, the LS map produced using high resolution DSMs and orthophotos generated from aerial orthophotos by performing the RF algorithm was evaluated for a region prone to multiple hazards, i.e. earthquake and landslides. The map was evaluated in five classes obtained from the Jenks classification of the predicted probability values as very low, low, moderate, high and very high. In addition, the FR method was used to assess the relationships between the input features (i.e. the conditioning factors) and the landslide inventory. The results showed that the RF yielded to high performance with AUC values of 0.90 and 0.92 for two sub-parts of the region. The FR ratios provided in Appendix denoted the influence of different class ranges on the landslide inventory with respect to their existence in the whole study area. The very high-resolution DSMs and the orthophotos produced for the area allowed a detailed FR analysis and susceptibility distribution in the area. In addition, a comprehensive landslide inventory, which includes masses even with very small sizes, could be derived from the high resolution datasets. Silalahi, F.E.S., Pamela, Arifianti, Y., Hidayat, F., 2019. Landslide susceptibility assessment using frequency ratio model in Bogor, West Java, Indonesia. Geoscience Letters, 6(1), 1-17. doi.org/10.1186/s40562-019-0140-4.