GEOMORPHOLOGICAL MAPPING OF INTERTIDAL AREAS

Spatiotemporal geomorphological mapping of intertidal areas is essential for understanding system dynamics and provides information for ecological conservation and management. Mapping the geomorphology of intertidal areas is very challenging mainly because spectral differences are oftentimes relatively small while transitions between geomorphological units are oftentimes gradual. Also, the intertidal areas are highly dynamic. Considerable challenges are to distinguish between different types of tidal flats, specifically, low and high dynamic shoal flats, sandy and silty low dynamic flats, and mega-ripple areas. In this study, we harness machine learning methods and compare between machine learning methods using features calculated in classical Object-Based Image Analysis (OBIA) vs. end-to-end deep convolutional neural networks that derive features directly from imagery, in automated geomorphological mapping. This study expects to gain us an in-depth understanding of features that contribute to tidal area classification and greatly improve the automation and prediction accuracy. We emphasise model interpretability and knowledge mining. By comparing and combing object-based and deep learning-based models, this study contributes to the development and integration of both methodology domains for semantic segmentation.


Background
Geomorphological mapping of intertidal areas in space and time is essential for understanding system dynamics and provides information for ecological conservation and management (Bouma et al., 2005). Ecological quality of intertidal areas is important because of the European Water Framework Directive and as they are designated as Natura 2000 areas, which is the implementation of the European bird directive and the European habitat directive. Mapping the geomorphology of intertidal areas is a considerable challenge mainly because spectral differences are oftentimes relatively small while transitions between geomorphological units are oftentimes gradual. Also, the intertidal areas are highly dynamic . Surface water, saltmarsh, and tidal flats are relatively simple to distinguish but considerable challenges remain for distinguishing between different types of tidal flats, specifically, low and high dynamic shoal flats, sandy and silty low dynamic flats, and mega-ripple areas . In this study, we harness machine learning methods in automated geomorphological mapping and extend object-based methods with deep neural network-based semantic segmentation methods. We focus on distinguishing between 5 classes: sandy low dynamic flats, silty low dynamic flats, mega-ripples, high dynamic shoal flats, and hard substrates. We evaluate the methods used with an extensive data set of visually interpreted photos.

Objective
The objectives of this study is to compare between the methods that are based on ensemble tree-based modeling on features of * Corresponding author segments from classical OBIA vs. deep neural networks-based methods.

Impact
This study is both application-and method-driven. It provides us with an in-depth understanding of features that contribute to tidal area classification and is expected to greatly improve the automation of the procedure and the prediction accuracy. We emphasise model interpretability and knowledge mining. By comparing and combining object-based and deep learningbased models, this study contributes to the integration of thus far largely separate approaches for semantic segmentation.

METHODOLOGY
The study is mainly developed in Western Scheldt, the Netherlands ( fig. 1), where a detailed manually delineated expert classification map is available. Figure 1 (b) shows our classification scheme. Importantly, we distinguish between the sandy and silty areas within the shoal flat low dynamics. We used aerial imagery with red, green, and NIR bands (0.25 m resolution), DEM from laser altimeters (2 m resolution), and derived indices. Imagery and DEM are acquired within the same year and season. Two methods are developed and compared in terms of the priors integrated, model interpretability, and the prediction accuracy and patterns. The first method firstly applies OBIA (Object-based Image Analysis) for segmentation and then uses an emsemble tree-based method, XGBoost (Chen and Guestrin, 2016), for classification, this method is referred to (OBIA-XGB) in this study. The challenges of this method lie in the identification of the optimal spatial unit of objects. The second method applies an encoder-decoder deep neural network archi-tectures, U-net (Ronneberger et al., 2015), for end-to-end classification.

Preprocessing
We focus on distinguishing between 5 classes and for convenience, we assign a code to each of them: • P1a1: sandy low dynamic flats For the definition of the classes please refer to Douma et al. (2019). We also refer to the P1a1, P1a2, P2b, and P2c classes as the "P" class and the H1 class "H" class. The other classes (i.e. classes that do not belong to the P and H1 classes) are removed.
We filtered out water by firstly using eCognition (Developer 9.4.0) to do segmentation, and then using the NDWI (N DW I = green−N IR green+N IR ) with a threshold of 0.44. Megaripples naturally consist of water. In this study, we also filtered out water in the P2b (mega-ripple) class to gain an initial understanding of the behaviour of machine learning models.

Sampling scheme
As the unit of method OBIA-XGB is OBIA segments and method U-net image pixels, we develop different sampling schemes for each of them. For method OBIA-XGB, it is necessary to develop a sampling scheme that accounts for the object size for the tree classifier. We evaluated two sampling schemes: 1) random sampling and 2) stratified sampling according to the size of OBIA segments. For 1), we used random under-sampling to sample the majority class(es), without replacement, to ensure the classes have balanced samples. For 2), we firstly divided the objects into 5 categories, cut at each 20th, 40th, 60th, 80th, 100th percentiles of the object size of the minority class (i.e. class with the least number of objects). These percentiles are used so that the number of objects in these categories are similar. Within each category, 70% of the objects form the training set and the rest forms the test set. Then within the training set of each category, random under-sampling is applied.
For OBIA-XGB, we selected three tiles that are abundantly covered by the focused classes (i.e. P and H1 classes) for identifying the optimum spatial unit and model training. For the method based on U-net, we selected 80 500 pixels × 500 pixels tiles for training (42 tiles) and validation (18 tiles). Three 4000 × 4000 tiles are used for testing both of the methods and calculating the accuracy metrics. The train-test splitting is shown in fig. 2.

Accuracy assessment
We focus on the precision (True Positives)/(True Positives + False Positives) and recall (True Positives)/(True Positives + True Negatives) as indicators of the prediction accuracy. For OBIA-XGB, the precision and recall are weighted by the area size of each object for the final recall and precision. In addition, we present the precision and recall based on objects (i.e. not weighted by the area size). This serves to inspect how the objects with various sizes are identified.
We firstly use eCognition for object segmentation at various spatial units. Then, we combine segmentation levels to get sub-objects and related features. We export the objects with in total 48 features describing the spectral and texture (using Harralick's grey level co-ocurrence matrices) features of objects and sub-objects.
Then, to determine the optimal spatial unit for each class, we each time regroup the five classes into the "target class" and the "other class" and iterate over all the five classes. The XGBoost is applied to classify the OBIA calculated object features and select the best spatial unit for each class based on the prediction accuracy.
We found no considerable differences in the optimal spatial unit for each class (will be described in section 4.1, fig. 3), therefore, we used a single optimal spatial unit for all classes. The identified optimal spatial unit is used for multi-class classification. The loss functions for the binary and multi-class classifications are respectively the logistic loss and softmax loss (i.e. a softmax activation followed by a cross-entropy loss).
The hyperparameters we tuned for XGBoost are learning rate, maximun tree-depth, number of estimators, and the Lasso regularisation term α, using 5-fold cross-validation. The modeling process using XGBoost and deep neural networks is implemented in the Python environment (version 3.6).

Method based on U-net
The input for the U-net is the aerial imagery, DEM, and derived slope (Horn, 1981), NDVI, and Brightness layers. The slope was calculated in QGIS3.18 using the same Horn's formula (Horn, 1981) as in eCognition, and is expressed in degrees. The NDVI and brightness are calculated in Python. The Brightness is defined as Brightness = (N IR + red + green)/3. (Berman et al., 2018). In the end, we chose to use the IoU loss as it provides the best cross-validation results with the training set. The hyperparameters we tuned are learning rate, batch size, and image size. The image size tested are from 2 5 (64) to 2 9 (512) and is optimised at 128. The batch size tested are from 8 to 32 and is optimised at 16. The learning rate are tested with various scheduling, ranging from 0.0001 to 0.01 at various epochs.  the cross validation between applying our Stratified Sampling (SS) scheme to random sampling of the objects (object).

Spatial unit optimisation for OBIA-XGB
It could be observed that when using both stratified sampling and area-weighted cross-validation (SSAC), all the classes obtained a very high precision with each of the tested spatial unit, indicating that larger objects are better identified than the smaller objects. The recall remains high for the H class but is lower for the P classes, especially the P1a1. Compared to SS, which has a relatively high recall for all the classes but much lower and diverse precision for different spatial units and classes, it is found that objects with smaller sizes are more likely to mix with other classes, especially for the P1a1, P2c, and H class. As the SSAC shows homogeneous recall and precision between spatial units, to optimise the classification for small objects and ensure the highest model generality, we use SS to identify the optimal spatial unit. The optimal spatial unit identified for all classes are 50 15 (object level 50, sub-object level 15).

Prediction accuracy
4.2.1 Method OBIA-XGB The accuracy metrics of the OBIA-XGB are shown in tables 1 and 2. SSAC is used to indicate the performance and SS for indicating how well the smaller objects are classified. According to SSAC, the megaripple class obtained the highest precision, indicating the method OBIA-XGB is already promising in distinguishing it from other P classes and the H1 class. The sandy and silty low dynamic flats are not well separated.
Comparing tables 1 and 2, it is observed that small objects better in the P2b and H1 classes but are less satisfying in other classes. From the spatial prediction maps, We can observe that mis-classification mostly occur as classifying P1a1, P1a2 and P2b as H1, especially along some channel outlets or close to water; and mis-classify P1a1 into P1a2 or p2b. The probability map also shows a high uncertainty (indicated as low probability) in separating between P1a1 and P1a2, as well as H1 and other classes.

Method based on U-net
The U-net segmentation obtained a considerably lower accuracy for P2b and P2c compared to the method OBIA-XGB (tables 1 and 3) but obtained a much better result for P1a1. P2b again obtained the best result compared to the other classes but is often confused with P1a1. The model failed completely to predict the H1 class.

CONCLUSION
In this study, we compared machine learning methods that are based on OBIA-derived features vs. end-to-end representative learning, for inter-tidal area classification, with respectively XGBoost and U-net. The method that applies XGBoost  to OBIA-derived features (OBIA-XGB) outperforms the U-net based methods in P2b and P2c, but the U-net method can better separate between P1a1 and P1a2. The OBIA-XGB obtained satisfying results in separating between the P2b and other classes, which is sometimes more difficult to distinct manually or basing on the rule-based OBIA methods alone. Therefore, our study indicates that machine learning methods can help to improve the rule-based OBIA method. Future studies aim at improve over the basic U-net method we applied in this study.

ACKNOWLEDGEMENT
This study was performed at Utrecht University with a contribution from the project 'Optimalisatie productie GMK met behulp van OBIA' financed by Rijkswaterstaat (Contract nummer: 31160225). The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition)