PERFORMANCE OF MACHINE LEARNING ALGORITHMS FOR MAPPING AND FORECASTING OF FLASH FLOOD SUSCEPTIBILITY IN TETOUAN, MOROCCO

Since the industrial revolution, the world is experiencing a huge change in its climate, which causes many imbalances such as flash floods (FF). The aim of this study is to propose a new approach for detection and forecasting of flash flood susceptibility in the city of Tetouan, Morocco. For this regard, support vector machine (SVM), logistic regression (LR), random forest (RF), Naïve Bayes (NB) and Artificial neural network (ANN) are used based on 1101 points (680 flood points and 421 non-flood points) and 9 flashflood predictors (Elevation , Slope , Aspect , LU/LC , Stream Power Index , Plan curvature , Profile Curvature , Topographic Position Index and Topographic Wetness Index ) that were extracted from the DEM (10m resolution) and satellite imagery (Sentinel 2B) of the study area . Models were trained on 70% and tested on 30% of this dataset also they were evaluated using several metrics such as the Receiver Operating Characteristic (ROC) Curve, precision, recall, score and kappa index. The result demonstrated that RF (AUC = 0.99, Accuracy = 96%, Kappa statistics = 0.92) has the highest performance, followed by ANN (AUC = 0.98, Accuracy = 95%, Kappa statistics = 0.89) and SVM (AUC = 0.96, Accuracy = 92%, Kappa statistics = 0.80). The proposed approach is an effective tool for forecasting and predicting FF that can help reduce the severity of this disaster.


INTRODUCTION
Scientists believe that there is a strong link between the industrialization and global warming (McGregor et al., 2016), and that's translate by many imbalances in the actual ecosystem. Global warming means rising in air and water temperature (Lykhovyd, 2018) that lead to numerous disasters such as storms, heat waves (Woolf and Wolf, 2013), forest fires (Molfetta et al., 2007) , droughts (Leng et al., 2015) and flash floods (Trenberth, 2008) . Their consequences either economic, human or environmental are fatal, in terms of mortality, Floods and flash floods are considered the most severe in the world and have also affected more than 2 billion people worldwide. (Organization, 2014). Morocco is not an exception, it has faces also innumerable flash flood events, the most important are Oued Ourika floods on 1987on ,1989on and 1995on (Atlas et al., 2014 , those of Mohammedia on 1996, 2002(Chaabane et al., 2017 , Guelmim region floods on 2002,2010 and 2014 (Talha et al., 2019) and the last event was in Tetouan on March 1st, 2021 . Nowadays, the vulnerability of flash floods is very high, it is necessary to think systematically about risk management (Panahi et al., 2021), starting with understanding the elements that influence the increase of this disaster, and then identifying flood prone areas. Many researchers strive to achieve these goals using several methods, starting with older method based on simple statistics such as frequency ratio, weight of evidence (Shafapour Tehrany et al., 2017 ;Talha et al., 2019 ;Radwan et al., 2019 ;Swain et al., 2020), and ending with artificial intelligence , machine learning approach (Ma et al., 2019 ;Arabameri et al., 2020Costache et al., 2020Elmahdy et al., 2020 ;Costache et al., 2021 ). The machine learning approach is a very effective method in terms of accurate identification and modeling , it attempts to solve flash flood problems by finding the relationship between the flood risk and its factors rather than direct determination of weights (Ma et al., 2019) and that, by using several algorithms such as Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB) and Artificial Neural Network (ANN). In the same context, many studies have appeared in recent years, for example, but not limited to, the study of (Elmahdy et al., 2020) that uses land use/land cover (LULC), Lithology , Slope , Altitude , Plan curvature ,Relief , Stream networks , Stream density and distance from streams as factors , that they were then introduced into the boosted regression tree (BRT), classification and regression trees (CART), and naive Bayes tree (NBT) algorithms to determine FF susceptibility in the United Arab Emirates (NUAE) .Another study in Egypt uses boosted regression tree (BRT), functional data analysis (FDA), general linear model (GLM), and multivariate discriminant analysis (MDA) based on nine factors ,including slope ,altitude, distance from main river , LU/LC, lithological units, curvature, aspect, and topographic wetness index (El-Haddad et al., 2021).In addition , the three state-of-the-art Artificial Neural Network (ANN), Random Forest (RF), and Support Vector Machine (SVM) coupled with Random Subspace (RS) were trained based on elevation, curvature, aspect, slope, topographic roughness index (TRI), topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), land use/land cover (LULC), distance to the river, soil type, and rainfall to map flood prone areas in Bangladesh .
Despite all these studies and researches around the world , there is a lack of such studies in Morocco, especially in the city of Tetouan, so the main objective of this study is to map and predict the susceptibility to flash floods in this area using a machine learning approach.

Study Area
Tetouan city is located in the region of Tangier -Tetouan-Al Hoceima (35°34′N , 5°22′W ) in the North of Morocco ( Figure  1) about 10 km from the Mediterranean Sea , 60 km east of the city of Tangier and 41 km south of the Strait of Gibraltar. It is bounded by two mountains, Dersa in the north and Ghorghiz in the south and crossed by Oued Martil (river) . The population is 578,283 compared to 157,684 in the countryside , with an estimated area of 2541 km 2 (Monograph-HCP, 2020). Tetouan city is part of the Rif chain in its outer part which contains allochthonous units (flysch sheets), pelitico-sandstone superstructure on an autochthonous to para-autochthonous unit with pelitic dominance (Maftahi et al., 2020). The highest elevation is 380 m (1247 ft) and Lowest elevation -2 m (-7 ft ).The old town rises on a hillside from the river, giving it the impression of towering above those coming from the south and west. This effect, combined with the city's white walls, earned it the nickname "The White Dove" (Britannica, 2014) . The city was designated a UNESCO World Heritage site in 1997 (UNESCO -WHC, 2021).

Method
Four steps formed the basis of this study, as shown in Figure 2. Starting with the data collection, then the derived dataset by generating thematic maps, the pre-processing step comes next which includes preparing the dataset for the next step of the machine learning process, and finally the result of flash flood susceptibility.

Data Used
In order to achieve the objective of this study and to ensure high accuracy, three types of data were used, namely the satellite imagery of the area (Sentinel 2b), the digital elevation model (DEM) and the position of the last flooded and non-flooded areas.

Satellite imagery
The satellite imagery considered very useful for land use and land cover classification (Zhao, 2015) in this order the Sentinel-2B satellite imagery of Tetouan territory with resolution of 10m to 60m was downloaded from Earth explorer (platform of the National Aeronautics and Space Administration) taken on 2021-02-02, only one month before the targeted flash flood on 2021-03-01

Digital Elevation model (DEM)
DEM is an important for earth science (Mukherjee et al., 2012) because it gives a quantitative idea about the representation of the terrain . DEM can be used either two-dimensional (2d) or three-dimensional (3d) ,also it can be generated from stereo data (San and Süzen, 2005) ,Airborne laser scanning (ALS) (Favey et al., 1999) , aerial stereo photograph or topographic surveys (Walstra et al., 2015) . The DEM data used in this study was extracted from the topographic map of the study area (Tetouan map scale .1/50000) with resolution of 10m.

Flood Information
Due to the inability to extract flood information from satellite imagery because of deep clouds, the day after the flood (01-03-2021), it was mandatory to conduct a field survey in the study area. Using a handheld GPS, 1558 points were located (Figure 4 -j ) with 846 points flooded and the rest not flooded.

Flash-Flood Predictor
The flash floods event is influenced by many geographical factors (Costache et al., 2020a) and this is confirmed by many researchers who use them in their studies ( (Radwan et al., 2019) , (Wu et al., 2019) , (Ma et al., 2019) , (Arabameri et al., 2020) , (Elmahdy et al., 2020), (Tien Bui et al., 2020) , (Song et al., 2020) , , (Panahi et al., 2021) ,  , (Costache et al., 2021) ) . Based on previous literature, choosing the most influential factors was a challenge, as they differ from study to study, but there are common factors. In this study, nine factors were therefore selected based on the local characteristics of the study area, namely Elevation, Slope, Aspect, LU/LC , Stream Power Index , Plan curvature , Profile Curvature , Topographic Position Index and Topographic Wetness Index . Otherwise precipitation is an important factor in flooding  but it was not used in this study because we have the same amount of rainfall throughout the region, thus it won't affect the study, and that was confirmed by Shahabi in his study  .In addition, each factor influencing the purpose of this study will be briefly described.

Elevation
Elevation is the height above the geoid or ellipsoid reference, it can affect any flash flood event . In other words, the occurrence of flash floods increases with decreasing elevation , that means that lower areas are more susceptible to flooding and higher areas are less susceptible. (Bisht et al., 2018) . The elevation map of the study area with a resolution of 10 m shows that the elevation is between -2 & 380 m (Figure 3 -a)

Slope
This factor computed based on DEM .According to the literature, slope is regarded as one of the factors that has the biggest effect on flooding ( (Rahmati et al., 2015), (Vahid et al., 2018) (Talha et al., 2019) ) . In addition slope controls water flow velocity (Costache et al., 2020b) and has been classified in this study into five classes with resolution of 10 m (Figure 3 b). Classes with values less than 11.07 are the most dominant and present in the middle of the study area, and for values greater than 55.462 are concentrated in the upper and lower parts of the study area.

Aspect
Aspect is a morphometric indicator ( (Costache and Tien Bui, 2020) , (Costache et al., 2020a) ) that was extracted from the DEM .The result of the aspect is a raster with a resolution of 10m with ten classes (Figure 3 -c) , each pixel shows the direction in which the surface faces at that location, it is calculated in the direction of clockwise in degrees from 0 to 360 (due north), knowing that the flat areas have no downhill direction (Burrough, P. A., and McDonell, R. A., 1998.)

Land Use Land Cover (LU/LC)
Land use is very important in these studies, it is a factor generated by remote sensing techniques with a supervised classification of the satellite imagery area that was captured by Sentinel 2b with a resolution of 10m . Six classes were produced (water, forest, vegetation, bare land, roads, and buildings) (Figure 3 -d) with a dominance of building areas that strongly promote flash flooding, followed by vegetated areas that also have a high flash flood potential value, with a notable lack of forest areas that strongly contribute to the water balance at the watershed level. (Costache et al., 2020a)

Stream Power Index (SPI)
The Stream Power Index (SPI) is a measure of the erosive power of flowing water (Moore et al., 1991). The SPI is calculated based on the slope and the catchment area (flow accumulation) using the following equation (equation 1). The result is clearly noticeable with a 10 m resolution raster ( Figure  3 -e) SPI = α × tang (β) Where α = flow accumulation β = the gradient or slope.

Plan curvature
Plan curvature is classified as a morphometric factor (Costache and Tien Bui, 2020) ,it was considered because it is valuable in outlining zones that are characterised by high runoff and zones with low runoff (Zaharia et al., 2012) .The result of this indicator was divided into three classes : concave , flat and convex (Figure 3 -f)

Profile curvature
Profile is another very important morphometric factor (Costache et al., 2020b) , it indicates the surface with accelerated surface runoff . Negative values indicate faster water flow over the surface while positive values indicate slower flow over the surface. Accelerated runoff surfaces (< -0.17) cover about 68% of the study area (Figure 3 -g)

Topographic Position Index
Useful indicator that calculates on the basis of the DEM, Its values were grouped into five classes using the natural break method (Figure 3h). This morphometric factor measures the difference between the elevation of a specific cell and the average elevation around it within a predetermined radius (equation 2) ( (De Reu et al., 2013) , (Vinod, 2017)) Where: M0 = elevation at the center point, Mn = elevation of the grid n = the total number of surrounding points

Topographic Wetness Index
Topographic wetness Index is a quantitative index that demonstrates the balance between flow accumulation and slope at the local scale (Figure 3 Where α = flow accumulation β = the gradient or slope.

Data preprocessing
This step aims to prepare the data to be ready for feeding into the machine learning models, it has been divided into two substeps: first the preparation of data, then the cleaning of data The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W3-2021 Joint International Conference Geospatial Asia-Europe 2021 and GeoAdvances 2021, 5-6 October 2021, online

Data preparation
Ten raster images (Figure 3 & Figure 4) were subjected to a data preparation that includes their transformation into numerical values using a GIS environment with some python libraries (Pandas and Numpy). The result of this step is a dataset that includes each pixel with its position in X, Y and its values for each factor with the correspondence class (flooding / non flooding) (table 1).

Data cleaning
The quality of the data has a significant impact on the results. In this context, it is necessary to clean the data before using it. One of the most appropriate ways to clean and prepare the data is to remove outliers that have been defined as a point away from most or all other points (Ghosh and Vogt, 2012) , thus finding and removing them is a challenge (G., 1995). In fact, there are no definitive statistical rules for identifying outliers.
Finding it depends on knowledge of the subject and an understanding of the data collection process. For this operation, the search for outliers was done visually by plotting the data using scatter plots, histograms and boxplots, then we proceeded to the elimination of the observations that are far from the cluster of true observations. This process was performed for each factor individually. The number of pixels was 1322 points, but after removing the outliers, it becomes 1101 pixels.

Machine learning Models
Humans learn from past experiences, but now it's time to make machines learn too, and that's exactly what machine learning is, which is a subset of artificial intelligence that allows machines to learn from past data. Supervised learning, unsupervised learning, and reinforcement learning are the three types of machine learning. Further, Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB) and Artificial Neural Network (ANN) are the main supervised algorithms proposed in this study.

Support vector machine (SVM):
SVM is among the most widely used machine learning algorithms (Liu et al., 2021). It is a supervised learning binary classifier based on the principle of structural risk minimization (Yao et al., 2008). SVM is a suitable algorithm to present the complex nonlinear relationship between inputs and outputs (Liu et al., 2021) , as in the flash flood case.

Logistic regression (LR)
Logistic regression is considered a probability model that was first proposed by Cox. (Cox, 1958). It is a supervised classification and prediction algorithm that attempts to find the relationship between the independent variables and the dependent variables , from linear and non-linear regression to parameter estimation to non-parametric models. (Jaeyeong Lee, 2021)

Random forest (RF)
A random forest is a classifier made up of a set of tree classifiers {h(x, k ), k = 1,...} where the {k } are independent random vectors that are identically distributed and where each tree emits a unit vote for the most popular class at the input x (Van der Aalst, 2016)

Naïve Bayes (NB)
Naïve Bayes is a simple special case of Bayesian networks, where one node is an attribute node, the others are feature nodes, and the feature nodes are supposed to be independent of each other (Tang et al., 2020). NB is a frequently used classification algorithm that is provided by a basic probability theorem called Bayes' rule, Bayes' theorem or Bayes' formula (Lewis, 1998).

Artificial neural network (ANN)
ANN (Artificial Neural Networks) is mathematical model (Kia et al., 2012) which can be defined as a predictor model based on brain architecture . It is made up of a series of hidden layers, neurons and connections.

Evaluation of the Models Performance
Evaluation is an essential step in the probabilistic modelling process ,and without it the model will not be considered reliable (Panahi et al., 2021) . In this study, the dataset after removing outliers contains 1101 points with 680 flood points and 421 non-flooded points. The models were trained on 70% of the dataset (the training dataset) and 30% was used to evaluate the performance of the models (the test dataset) and that using several metrics such as precision (equation 4  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W3-2021 Joint International Conference Geospatial Asia-Europe 2021 and GeoAdvances 2021, 5-6 October 2021, online P = total number of flood pixels N = total number of no flood pixels

Flash Flood Risk Map
Based on the SVM, LR, RF, ANN and RF models, flood risk maps of Tetouan city are generated in a GIS environment. The final map is divided into five classes: Low, Lowest, Moderate, High and Highest susceptibility to flash floods. As shown in Figure 4, In each map, the high-risk areas are mainly concentrated in the central-eastern region and along the Ouad Martil River. In addition, it is clear that the very high probability classes contain the largest number of training points that were extracted from the field survey that was done at the time of the flash flood (this information confirmed that the model performed well) . These classes are followed by "high", "moderate", "low" and "very low" respectively, indicating that the models were trained correctly, which was confirmed by the comparison that was made between the predicted data and the real-world data. For the SVM model, 32% of the study area is considered to have a highest risk of flooding, 19% a high risk, 18% a moderate risk, 10% a low risk and 20% a lowest risk. For LR model 38% of the study area is considered to have a highest risk, 18% a moderate risk, 18% a low risk and 25% a lowest risk. The last one is RF model, 36% of the study area have a highest risk of flooding, 19% a high risk, 18% a moderate risk, 12% a low risk and 15% a lowest risk. Further analysis in 3D mode generating using RF model, as shown in Figure 5, shows that the entire city is at risk of being drowned, which also confirms the information that the city is located between two mountains. Another view in 3D mode as shown in Figure 6 , shows that most of the buildings are at risk of flooding, especially those in the valley. Figure 8  Another method for assessing model reliability is the AUC method. This method aims to identify true and false positive rates by plotting the sensitivity of the model as a function of the specificity. As shown in Figure 7 the AUC values for RF, ANN, SVM , LR and NB are 0.98 , 0.97 , 0.96 , 0.83 , 0.84 respectively .This AUC method confirmed the information extracted from the other metrics which conclude that RF, ANN and SVM are the best models. Furthermore, our results provide an excellent outcome, as expected, and are in agreement with previous research. ( Kia et al., 2012 ;Hong et al., 2018 ;Muñoz et al., 2018) .

CONCLUSION
Climate change and global warming are inevitable, natural disasters are everywhere. One of them is the flash flood that causes a lot of damage. This makes us think about finding more effective solutions to reduce the severity of this problem. Therefore, effective methods are needed to delineate the most sensitive areas to this disaster in order to reduce its losses. Machine learning is one of the methods that appear in recent years. The current work aims to investigate and apply five machine learning algorithms: support vector machine (SVM), logistic regression (LR), random forest (RF), Naïve Bayes (NB) and artificial neural network (ANN), which are considered as new techniques to map the susceptibility to flash floods in the city of Tetouan, Morocco. Nine flash-flood predictors (Elevation, Slope , Aspect , LU/LC , Stream Power Index , Plan curvature , Profile Curvature , Topographic Position Index and Topographic Wetness Index ) were used based on 1101 points that were evaluated using many metrics such as ROC Curve, precision, recall, score and kappa index . This study concludes that RF is the best model for mapping flash flood susceptibility. The objective of this study was achieved with a satisfactory result. Thus, the generated model can be applied in other regions by collecting the factors related to the region under study. The study can also be used in urban planning by identifying areas that are highly susceptible to flooding in order to avoid future approval of new buildings at risk. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W3-2021 Joint International Conference Geospatial Asia-Europe 2021 and GeoAdvances 2021, 5-6 October 2021, online Elmahdy, S., Ali, T., Mohamed, M., 2020. Flash flood susceptibility modeling and magnitude index using machine learning and geohydrological models: A modified hybrid approach. Remote Sens. 12. https://doi.org/10.3390/RS12172695 Favey, E., Geiger, A., Hilmar Gudmundsson, G., Wehr, A., 1999. Evaluating the potential of an airborne laser-scanning system for measuring volume changes of glaciers by. Geogr. Ann.
Ser. A Phys.