EFFECT OF DIFFERENT SEGMENTATION METHODS USING OPTICAL SATELLITE IMAGERY TO ESTIMATE FUZZY CLUSTERING PARAMETERS FOR SENTINEL-1A SAR IMAGES

Optical and SAR data are efficient data sources for shoreline monitoring. The processing of SAR data such as feature extraction is not an easy task since the images have totally different structure than optical imagery. Determination of threshold value is a challenging task for SAR data. In this study, SENTINEL-2A optical data was used as ancillary data to predict fuzzy membership parameters for segmentation of SENTINEL-1A SAR data to extract shoreline. SENTINEL-2A and SENTINEL-1A satellite images used were taken in September 9, 2016 and September 13, 2016 respectively. Three different segmentation algorithms which are selected from object, learning and pixel-based methods. They have been exploited to obtain land and water classes which have been used as an input data for parameter estimation. Thus, the performance of different segmentation algorithm has been investigated and analysed. In the first step of the study, Mean-Shift, Random Forest and Whale Optimization algorithms have been employed to obtain water and land classes from the SENTINEL-2A image. Water and land classes derived from each algorithm – are used as input data, and then the required parameters for the fuzzy clustering of SENTINEL-1A SAR image, were calculated. Lake Constance, Germany has been chosen as the study area. In this study, additionally an interface plugin has been developed and integrated into the open source Quantum GIS software platform. The developed interface allows non-experts to process and extract the shorelines without using any parameters. But, this system requires pre-segmented data as input. Thus, the batch process calculates the required parameters. * Corresponding author


INTRODUCTION
Shorelines have a dynamic characteristic, and changes with natural or human induced effects. Remote sensing and image processing techniques are useful and modern tools to use for shoreline monitoring and change detection (Kutser et al., 2012). Shoreline can be extracted using different algorithms which have been proposed for temporal monitoring of coastal regions. (Gens, 2010). Shoreline extraction problem is a popular topic and there is considerable amount of researches related to this issue. Some of these studies related to application of unsupervised methods (Guariglia et al., 2006), some are related to application of water based indices (Zheng et al., 2011) and some of them deal with the application of morphology (Pardo-Pascual et al., 2012). Active contours have also been applied by (Schmitt et al., 2015). Particle Swarm optimisation, Mean-Shift and object-oriented fuzzy classification approaches were also used in different studies , Catal Reis , et al., 2016 . On the other hand, SAR images are more advantageous with their capability to acquire data in all weather conditions than optical data. Fuzzy clustering based shoreline extraction from SAR images have been realized in several studies (Demir et al., , 2017. The processing and classification of SAR images required several parameters. Primarily results from multispectral images can be used as training data sets to estimate these parameters. In this study, different methods for shoreline extraction from multispectral image were investigated. These are Mean-Shift, Random Forest and Whale Optimisation. The results from these methods have been used as input data to estimate the fuzzy clustering parameters for SENTINEL-1A image classification. After this process, obtained shorelines were compared with manually digitized shoreline from SENTINEL-2 image.

USED DATA AND METHODOLOGY
Lake Constance which is located on the Rhine at the northern foot of the Alps was selected as study area (Figure 1). This lake is situated a transboundary lake by Germany, Switzerland, and Austria (Hammerl, 2006). The properties of SENTINEL-1A SAR and SENTINEL-2A data are given in Table 1 and Table 2  To obtain ancillary water and land classes, blue, red and near infrared bands of SENTINEl-2A image have been used as given in Figure 2.

Shoreline segmentation by Mean-Shift Method
The Mean-Shift algorithm is an object-based method which has been proposed first by (Fukunaga and Hostetler., 1975) and modified by (Comaniciu and Meer, 2002). Mean shift is a nonparametric method for kernel density estimation and finds the maxima of a local neighbourhood. According to this method, feature space is considered as a probability density function (pdf). Regions with dense points in feature space corresponds to local maxima or modes (Qin, 2015). So, for each data point, gradient ascent on the local estimated density until convergence was performed (Zhang, et al., 2012). (Comaniciu and Meer, 2002) suggested RGB to L * u * v color conversion in the first step of Mean-Shift segmentation. But, (Chauhan and Shahabade, 2014) have proposed RGB to HSV transformation.
In this presented study, (Comaniciu and Meer, 2002)'s proposal has been considered. EmguCV library was used to implement Mean-Shift algorithm in .NET environment. The used parameters are given in Table 3.

Parameters Value
Spatial Window Radius Color Windows Radius Maximum Iterations Minimum Segment Size 10 5 100 25 Table 3. Used Mean-Shift parameters After thresholding, Mean-Shift segments were created to obtain binary water and land classes. Threshold values have been defined as 0-255, 0-255 and 1-2 for blue, red and NIR bands respectively. Segmentation result is given in Figure 3, a and obtained binary water and land classes are given in Figure 3, b.

Shoreline segmentation by Random Forest Method
Random Forest is a learning based classification algorithm which is based on decision trees. This method creates determined object classes by analysing of given training data sets (Breiman, 2001). Two parameters are required for this algorithm, number of trees and the number of random variables to be used for each node for creation of decision trees (Belgiu and Dra˘gut, 2016). The algorithm creates Multiple CART-like trees in training step (Breiman, 2001). Bootstrapped technique is used for determination of split for each node (He et al., 2015). The best split is determined by GINI index according to CART algorithm. Thus, for each node, from randomly selected variables, the homogeneity of samples is measured. The variable which corresponds to minimum GINI index is selected and calculations are repeated for next node. GINI index is resulted with zero, related not is evaluated as totally homogeneous and defined as end-of-branching. The out of bag samples (test data) are cross validated for each tree. Classification vote is calculated by considering weight of decision tree and the related pixel is assigned to the majority voted class (Gislason et al., 2006).
Used SENTINEL-2A image consists of 4072 x 6951 pixels. In the training step, totally 1200000 pixels are selected for training. 600000 pixels have been selected for water and 600000 pixels of land classes. Statistics and Machine Learning Toolbox of MATLAB environment has been used for The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-1, 2018 ISPRS TC I Mid-term Symposium "Innovative Sensing -From Sensors to Methods and Applications", 10-12 October 2018, Karlsruhe, Germany implementation of the Random Forest algorithm. The used parameters are given in Table 4. Blob analysis has been utilized after classification for noise removal. Classification result and obtained water and land classes after noise removal have been given in Figure 4-a, and Figure 4-b.

Shoreline segmentation by Whale Optimisation Algorithm
Whale Optimisation Algorithm is a multi-level thresholding technique which divides an image into multiple regions to perform segmentation. Automatic selection of optimum thresholds is one of the biggest challenges in image segmentation (Muangkote et al., 2016). Multilevel thresholds are generated using either the Otsu or the Kapur entropy function by the method (Bhandari et al., 2014). It is based on social behaviour of whales (Mirjalili and Lewis 2016). Determination of optimum threshold values is done by optimization of fitness function (Aziz et al., 2017). Each whale represents a set of solutions which consists of threshold values. First the threshold values are determined randomly between minimum and maximum values of the image histogram for each solution set. The whale which take the highest value according to the fitness function is set. The location of whales is updated according to their encircling and buble-net behaviour. This process is iteratively maintained (Mirjalili ve Lewis, 2016). In this study, Whale Optimization Algorithm was utilized using MATLAB platform. The used parameters are given in Table 4 Otsu function has been used as the fitness function. Since two threshold values have been created for each band, totally 27 classes have been generated.
As there are 2 threshold values in each band, a total of 27 classes are obtained. Figure 4 shows the resulting segmented image of the Whale Optimization Algorithm.
Thresholding has been applied on the segmented image. Threshold values have been defined as 3-143, 2-145 and 4-14 for blue, red and NIR bands respectively. Blob analysis has been utilized after classification for noise removal. Thresholding result and obtained water and land classes after noise removal are given in Figure 5-a, and Figure 5-b.

Shoreline extraction from SENTINEL-1A image
Lee filter has been applied on SENTINEL-1A image for speckle reduction as given in Figure 6. Land and water memberships have been defined for clustering of SENTINEL-1A image. Because the mean-standard values between the classes expected as large, MS Large membership function has been selected. The membership equations for water has been defined in equation (1) as follows: (1) The membership for the land is given with the equation 2 as follows; μ(x)= bs/(am+bs) -1 if x>am else μ (x)=0 (2) After definition of memberships, the centroid method was used to determine the threshold for defuzzification (Figure 7). The used parameters mean, standard deviation a and b are derived from the results of SENTINEL-2 classification.
Quality assessment was performed with comparison manually digitized shoreline from SENTINEL-2 data. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-1, 2018 ISPRS TC I Mid-term Symposium "Innovative Sensing -From Sensors to Methods and Applications", 10-12 October 2018, Karlsruhe, Germany

Water and land segmentation results from Mean-Shift, Random
Forest and Whale Optimization methods method were used to calculate the defuzzification parameters as shown in Table 5.  Table 5. Calculated parameters for fuzzy clustering After calculation of the parameters, the defuzzification thresholds have been calculated from Mean-Shift segmentation, Random Forest and Whale Optimization methods as 6.06, 5.92 and 6.0 respectively as given in Figure 8-a, b, c. Segmented images and shoreline extraction results by Mean-Shift segmentation, Random Forest and Whale Optimization methods are given in Figure 9-a, b, c and Figure 10-a, b, c respectively. The derived results from each method were compared with manually digitized shorelines. The perpendicular distances between reference data and shoreline extraction results have been calculated using 10 m space length which is the spatial resolution of SENTINEL-1A SAR image. Statistics of the perpendicular distances to the reference shoreline were computed. The calculated values are listed in Table 6.  Table 6. Quality assessment statistics of the shorelines derived from fuzzy clustering with use of estimated parameters.

CONCLUSIONS
In this study, shorelines were extracted from SENTINEL-1A SAR image with estimated parameters from SENTINEL-2A multispectral image classification results. As shown in the accuracy assessment, the random forest algorithm has the best performance among other methods for the estimation of the parameters used in SAR data fuzzy clustering. According to the outcomes of the study, it could be confirmed that the accuracy of the segmentation plays a key role for parameter estimation. Mean-Shift and Whale Optimization methods require definition of thresholds for post processing which is essential for these methods. Random Forest approach requires well collected training data which affects the results directly. The number of trees is another important parameters for this method.