SNOW AVALANCHE SUSCEPTIBILITY MAPPING FOR DAVOS, SWITZERLAND

Snow avalanches are among destructive hazards occurring in mountainous regions and spatial distribution (susceptibility) of their occurrences needs to be considered for spatial planning and disaster risk mitigation efforts. The susceptibility assessment is the first step in avalanche disaster management and can be carried out using high resolution geospatial data and machine learning (ML) algorithms. In this study, we have assessed the snow avalanche susceptibility in Davos, Switzerland using an inventory delineated on satellite imagery in a previous study. The conditioning factors used for the avalanche susceptibility assessment include elevation, slope, plan curvature, profile curvature, aspect, topographic position index, topographic ruggedness index, topographic wetness index, land use and land cover, lithology, distance to road, and distance to the river. Two ML algorithms, the logistic regression (LR) and the random forest (RF), were comparatively assessed using validation data split from the training data (30/70). The prediction performances of both models were assessed based on the area under the receiver operating characteristic curve (ROC-AUC) value. Although the AUC value obtained from the LR method was relatively low (0.74), the value obtained from the RF (0.96) demonstrated high performance and usability of this approach. The results indicate that the RF method can successfully produce an avalanche susceptibility map for the region, although potential improvements may be possible by investigating various input features and ML algorithms as well as by classifying the starting and runout zones of the avalanche data separately. Furthermore, the accuracy is expected to increase by using a larger training dataset.


INTRODUCTION
Snow avalanches are among widely observed natural hazards affecting human life, economy, infrastructure, vegetation, and geomorphology in mountainous and cold regions. A snow avalanche is defined as a rapidly moving mass of snow on steep slopes (Schweizer et al., 2003). Snow avalanche susceptibility is the spatial probability for avalanche occurrence. Avalanche susceptibility assessment is the first and essential stage of the hazard and risk assessment for disaster management and mitigation.
Scientific analysis of snow avalanches has become crucial to mitigate risks through modeling, mapping, visualizing, and monitoring of susceptible regions with the help of Geographic Information Systems (GIS) and remote sensing (RS) (Yilmaz, 2010;Kumar et al., 2016). Field-based studies are limited by the high-risk exposure and can be time-consuming due to the snow mass instability and adverse weather conditions, compared to the GIS and remote sensing-based approaches (Eckerstorfer et al., 2016). On the other hand, GIS and RS are significant and costeffective tools for avalanche assessments (Bühler et al., 2018).
The occurrence of avalanche hazards depends on the conditioning and the triggering factors (Nefeslioglu et al., 2013). The snowpack characteristics (e.g., thickness, stability, density, water content, grain size, etc.), the atmospheric conditions (e.g., air temperature, precipitation, wind speed, wind direction, etc.), and topographical factors (elevation, slope, curvature, aspect, ground cover, etc.) have frequently been considered as conditioning factors for avalanches in the literature. Rapid * Corresponding author temperature changes, heavy rainfalls, earthquakes, and anthropogenic activities are the triggering factors that initiate the snow mass movement (Hao et al., 2018;Kumar et al., 2017;Nefeslioglu et al., 2013).
Researchers have applied various expert-based techniques for avalanche susceptibility mapping (ASM) such as fuzzyfrequency ratio (FR) (Kumar et al., 2016), analytical hierarchical process (AHP) (Nefeslioglu et al., 2013;Selçuk, 2013), etc. The data-driven machine learning (ML) applications have made great strides for natural hazard assessments in recent years due to the ability to learn, predict and improve based on historical hazard events without human intervention; and the capability of trend and pattern identification as well as dealing with multidimensional and multi-source data such as conditioning and triggering factors. Although various ML applications exist on floods and landslides, the mechanism for ASM has not been clearly understood due to the difficulties in inventory preparation. Mosavi et al. (2020) implemented an ensemble ML model, random subspace functional tree (RSFT), and compared the model outcome with the other ML methods such as logistic regression (LR), logistic model tree (LMT), alternating decision tree (ADT), and functional trees (FT) for Karaj Watershed, Iran. Tiwari et al. (2021) applied a Support Vector Machine (SVM) to predict avalanche susceptibility with 4 different kernel approaches. Akay (2021) indicated that the Random Forest (RF) is appropriate for ASM. Rahmati et al. (2019a) comparatively evaluated various ML methods for avalanche susceptibility map (ASM) production it two different sites and found the RF method very successful. Choubin et al. (2020) employed a generalized additive model (GAM), multivariate adaptive regression spline (MARS), boosted regression trees (BRT), and SVM for comparing ensemble ML methods. The results from the mentioned studies have shown that the ML methods can provide a useful estimate for avalanche susceptibility.
In this study, the LR and the RF, which are commonly used ML methods, were implemented to produce the ASM of Davos (Switzerland) using a total of twelve conditioning factors, such as elevation, slope, plan curvature, profile curvature, aspect, topographic position index (TPI), topographic ruggedness index (TRI), topographic wetness index (TWI), land use/ land cover, lithology, distance to road, distance to the river. The avalanche inventory was prepared in a previous study by Hafner et al. (2021a) in the form of vector data (polygons) and provided for the purposes of the present study to be employed as the training data for the supervised ML methods mentioned above. The other input features (conditioning factors) were derived from the geospatial datasets obtained from Swiss Federal Office of Topography, Switzerland (Swisstopo). In the following Sections, the datasets, methods, and the ASM results are presented in detail and discussed accordingly.

MATERIALS AND METHODS
In this Section, the study area characteristics, the input datasets and the pre-processing methods as well as the ML methods and the validation approaches are explained in detail. The location of the study area is shown in Figure 1. The overall methodological workflow employed in the study is presented in Figure 2.

Study Area
The study area, Davos, is located in the Eastern Region of Switzerland. Snow avalanche hazards occur frequently in the region due to climatic and topographic characteristics. The study area covers approximately 336 km 2 and has an altitude range from 1,158 m to 3,144 m. According to the Institute for Snow and Avalanche Research (SLF), Switzerland, 17 people lost their lives in Davos region due to snow avalanches during 2002(SLF, 2021. The geology of the region is characterized by Lower Penninic-Upper Austroalpine plate boundary . The study area comprises southeast of the Prättigau halfwindow and lies below the Silvretta nappe (Nagel, 2006). The western part of the study area covers mostly Upper Austroalpine sediments and volcanites. In the eastern part, crystalline formations of the Silvretta nappe consist mainly of metamorphic rocks (gneiss, mica slate, amphibolite).

Input Datasets and Features
A reliable and complete inventory is essential to determine the effects and the characteristics of avalanches (Tiwari et al., 2021). The avalanche inventory was manually produced by Hafner et al. (2021b) for two avalanche periods in 2018 and 2019 from satellite images and provided for the present study. The inventory (Hafner et al., 2021b) includes the location information in the form of polygons. Figure 1 shows the avalanche inventory and 3D views of some parts. Two avalanche polygons with runout zones in the valley were not employed in the model training stage since they may increase the uncertainty in the models. The avalanche inventory map was rasterized prior to model training and the avalanche and non-avalanche pixels were labeled as True (1) and False (0), respectively. The slope is an important component of avalanche susceptibility studies, owing to avalanches generally occurring on snowcovered slopes between 30 0 -45 0 (Schweizer and Jamieson, 2003). Aspect can be defined as the direction of a terrain associated with a compass. Although snow avalanches tend to occur from all aspects, the reported literature cases have shown that northern regions are more prone to avalanches (Winkler et al., 2021). The term curvature is used for describing the morphology of slope, which is an important factor for snow cover stability (Akay, 2021). The plan and profile curvatures affect snow-mass movement into horizontal and vertical directions, respectively. The TPI refers to the change in elevation of a central point and mean height of a predefined set of neighboring points (Wilson and Gallant, 2000). A study conducted by Choubin et al. (2020) has shown that the TPI was among the most significant factors for four different avalanche susceptibility prediction models. The TRI is a parameter that measures the surface roughness, which may affect snowpack destabilization (Kumar et al., 2017). The TWI provides information about the hydrological condition of topography (Rahmati et al., 2019b).
The digital elevation model (DEM) employed in the study (swissALTI3D) was freely provided by Swisstopo (2021), and was downsampled here to 10 m spatial resolution for computational reasons. The slope, plan curvature, profile curvature, aspect, TPI, TRI, and TWI were calculated from the 10 m DEM using the SAGA GIS software (Conrad et al., 2015). According to Parshad et al. (2017), it may be difficult to indicate the direct effect of elevation on snow avalanche hazards. Yet, it must be taken into account for the terrain perspective to be exposed to precipitation, temperature, and wind. In addition, distance to river and distance to road factors were calculated with proximity grid module in SAGA GIS and stored in raster format.
The lithological units in a region can affect heat absorption and transfer, which may lead to snow mass movement and avalanches . Considering the land use and land cover (LULC) and the avalanche relationship, several researchers emphasized that avalanches frequently occur on grassland and bare land (Bergua et al., 2018;Maggioni et al., 2016;Suk and Klimánek, 2011). Even though some avalanches were observed in forests, it was concluded in some studies that the forests were also effective in reducing the avalanche risk (Varol, 2022).
In this study, the conditioning factors were used as model input and divided into two categories as numerical and categorical. The lithology and LULC data were considered as categorical features.  Figure 3 shows the conditioning factors as maps (1821 x 1807 pixels each at 10 m size). The pixels employed as non-avalanche class in the training dataset were randomly selected with an equal number to avalanche class pixels (a ratio of 1:1 for avalanche: non-avalanche). In the second step, the dataset was randomly split as training (70%) and validation (30%) datasets for avalanche prediction based on LR and RF. Open source scikitlearn library Pedregosa et al. (2011) was used for processing the methods in a Python environment. Table 2 demonstrates the statistical summary of avalanche and non-avalanche areas for the numerical features. The statistical metrics include the mean, standard deviation, minimum, 25% 50%, 75%, maximum.

Snow Avalanche Susceptibility Mapping
In this study, the LR and the RF methods were evaluated for their prediction performances. The LR is an important and frequently used statistical tool for binary classification problems . The method measures the relationship of variables with logistic curves, similar to linear regression (Yariyan et al., 2020). In this study, LR was used to estimate the avalanche and non-avalanche probability of the study area. The sklearn.linear_model.LogisticRegression library was applied with "C = 100, solver = 'lbfgs', class_weight = 'balanced'" parameters as a result of hyperparameter tuning.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France  The RF is a tree-based ensemble learning algorithm, that was first proposed by Breiman (2001) and is widely used for classification and regression problems. The method creates trees from randomly selected data with the bootstrap technique. The sklearn. ensemble.RandomForestClassifier library was implemented with "n_estimators = 250, criterion = 'entropy', max_depth = 16, max_features = 16, class_weight = 'balanced_subsample', oob_score = 'true', bootstrap = 'true'" parameters. The parameters with the highest accuracy were selected, as a consequence of the hyperparameter optimization.
In order to improve the model performances, the HalvingGridSearchCV method was utilized for the hyperparameter optimization (HalvingGridSearchCV, 2022).
The optimal values obtained from the method were applied in the prediction. In the third step, overall accuracy (OA), the receiver operating characteristics (ROC) curve and area under the curve (AUC) value were produced to evaluate the performances.

RESULTS
The OA of the LR model was found 0.66. The result of the prediction was verified using the ROC (Figure 4) and an AUC value of 0.74 was obtained. Figure 5 illustrates the color gradient distribution of the probabilities acquired from the LR model. The OA of the RF model was found 0.88. Figure 6 indicates the performance of the predictive model using the ROC and an AUC value of 0.96 was obtained from the RF. Figure 7 shows the gradient map of the probabilities obtained from the RF model.
The ASMs were classified as very low (0 -0.2), low (0.2 -0.4), moderate (0.4 -0.6), high (0.6 -0.8) and very high (0.8 -1.0) susceptibilities by using equal intervals. In ASM obtained from the LR shows that 17.95% of the study area has very low, 30.82% low, 29.43% moderate, 18.62% high, 3.17% very high susceptibility classes. As a result of the RF, 64.24% of the study area has very low, 15.49% low, 10.62% moderate, 7.31% high, 2.34% very high susceptibility classes. In Figure 8 shows the probability distribution histograms obtained from both methods.
While the LR provides a normal distribution, the RF results yield to a geometric distribution of the outputs. Based on the maps and the statistical summaries mentioned previously, it can be concluded that the LR method possibly overestimates the areas susceptible to avalanches. This situation also explains the lower OA value obtained from this model.  On the other hand, when the sub-areas given in Figure 9 are analyzed in detail; it can be emphasized that the RF determined the avalanche zones with higher susceptibility levels. However, when the two avalanche polygons, which were not used in model training due to the inclusion of runout zones, are analyzed; the LR exhibited higher susceptibility levels for those. This finding indicates that the selection of training zones and the feature importance obtained from both methods must be investigated to understand the results and to obtain higher accuracy.

DISCUSSIONS AND CONCLUSIONS
A snow avalanche is a frequently observed natural hazard threatening lives and properties in mountainous and cold regions. The ASMs can be used as a basemap or initial data by researchers, designers, and decision-makers for regional land use planning, site selection, and avalanche prevention and mitigation purposes. In the present study, the LR and RF models were employed for snow ASM with 12 conditioning factors in Davos, Switzerland. The training data (Hafner et al., 2021b) was produced in a previous study for two avalanche periods and provided by the SLF.
The results show that the AUC value of the RF (0.96) was better than the LR (0.74), which indicates that the RF exhibited higher prediction performance for the study area. However, further attention needs to be paid to the training data selection to prevent from model overfitting, and the input features to utilize the most suitable ones in the modeling stage. In addition, testing and validation datasets must be selected properly for increasing the accuracy and the reliability of the models. Furthermore, the avalanche inventory dataset (Hafner et al., 2021b) was limited to two avalanche periods only, and a larger inventory can contribute towards a better understanding of avalanche susceptibility analysis and to obtain higher accuracy. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France Figure 9. The LR (a) and RF (b) results in a part of the study area (purple rectangle in Figure 9). The two avalanche polygons shown on the right column were not employed in model training.  Rahmati et al. (2019a) conducted a comparative study with RF, SVM, Naïve Bayes and GAM models for the Darvan and the Zarrinehroud watersheds and found the RF successful in both Darvan (AUC = 0.964) and Zarrinehroud (AUC = 0.956). When the results presented in this study are compared with those mentioned above, it can be emphasized that the performance of the RF was also very high here and it is a suitable method for ASM production.
As future work, the model inputs will be analyzed in terms of feature importance and further ML methods will be assessed. Snowpack knowledge, additional terrain features and meteorological conditions could be useful to devoloping the ASMs. Furthermore, the inventory data used here will be analyzed to separate the starting and runout zones.