SUPPORT VECTOR MACHINE AND DECISION TREE BASED CLASSIFICATION OF SIDE-SCAN SONAR MOSAICS USING TEXTURAL FEATURES

: The diversity and heterogeneity of coastal, estuarine and stream habitats has led to them becoming a prevalent topic for study. Woody ruins are areas of potential riverbed habitat, particularly for fish. Therefore, the mapping of those areas is of interest. However, due to the limited visibility in some river systems, satellites, airborne or other camera-based systems (passive systems) cannot be used. By contrast, sidescan sonar is a popular underwater acoustic imaging system that is capable of providing high-resolution monochromatic images of the seafloor and riverbeds. Although the study of sidescan sonar imaging using supervised classification has become a prominent research subject, the use of composite texture features in machine learning classificat ion is still limited. This study describes an investigation of the use of texture analysis and feature extraction on side-scan sonar imagery in two supervised machine learning classifications: Support Vector Machine (SVM) and Decision Tree (DT). A combinatio n of first-order texture and second-order texture is investigated to obtain the most appropriate texture features for the image classification. SVM, using linear and Gaussian kernels along with Decision Tree classifiers, was examined using selected texture features. The results of overall accuracy and kappa coefficient revealed that SVM using a linear kernel leads to a more promising result, with 77% overall accuracy and 0.62 kappa, than SVM using either a Gaussian kernel or Decision Tree (60% and 73% overall accuracy, and 0.39 and 0.59 kappa, respectively). However, this study has demonstrated that SVM using linear and Gaussian kernels as well as a Decision Tree makes it capable of being used in side-scan sonar image classification and riverbed habitat mapping.

Coastal, estuary and stream areas have diverse and heterogeneous habitats such as seagrass beds, mangrove, coral reefs and fish (Micallef et al., 2012;Mustajap et al., 2015).Thus, the habitat of streams and rivers has become a prevalent topic for study in many fields.Kaeser & Litts (2008) noticed that the wide distribution of woody ruins that are found in many rivers and streams can be a potential riverbed habitat, especially for fish.Understanding the spatial distribution of benthic habitat by providing maps of the substrates and seabed morphologies becomes essential in developing and managing river and estuary environments with ecosystem-based management strategies (Hasan et al., 2012;Kaeser et al., 2013;Buscombe, 2017 Underwater acoustic technologies, such as multibeam echosounding (Kostylev et al., 2001;Parnum, 2007) and sidescan sonar (SSS) (Blondel, 2009;Lurton et al., 2015;Gutperlet et al., 2017) have been successful used for marine and estuarine habitat mapping; especially in turbid environments where optical-based methods can be ineffective.Of the various acoustic mapping technologies available, SSS was chosen to carry out the benthic mapping in this study, as it can provide wide coverage in shallow water depth (<10 m), and the imagery is able to capture epi-benthic structure in high-resolution (Anderson et al., 2008;Blondel, 2009;Buscombe, 2017).SSS had also been previously shown to be effective at mapping benthic habitats in the lower Swan River (Parnum & Gavrilov, 2009).SSS operates by transmitting an acoustic signal that is wide in the port and starboard direction, but narrow fore and aft.SSS is typically deployed as a tow-fish to keep a low altitude and decouple it from vessel motions.As a SSS moves through the water, it builds up an acoustic (monochromatic) image of the seafloor made up of the received acoustic backscatter levels (i.e. the amount of sound scattered back towards the tow-fish).The backscatter image recorded by a SSS can be used to identify structures and changes in substrate.Hard and/or rough structures return high backscatter values; whereas, soft and smooth surfaces return low backscatter values.In addition, the texture of the image can be useful for interpretation and classification.Although SSS does not directly measure depth across the image, the length of the shadow relative the tow fish's position can be used to infer the height of an object above the seafloor.For more details on the theory and application of SSS see Blondel (2009).
Benthic habitat maps based on the segmentation and identification of features in SSS images, can be done manually (Bickers, 2003) or using automated image analysis (Blondel et al., 1998;Blondel, 2009;Parnum & Gavrilov, 2009;Buscombe, 2017).Due to the imperfect nature of SSS images, the automation of the creation of habitats maps using SSS is not trivial to implement and artefacts in the data can decrease the accuracy.Moreover, automation of SSS has not be comprehensively tested on all types of benthic habitat.
The use of textural analysis has been used in the wider community of SSS image analysis research (Blondel, 2009).Although both first -order statistical texture and second-order statistical texture analysis have been proven as promising input for image segmentation and classification, most studies tend to use second-order analysis.A popular subject of research in sidescan sonar image analysis that can be found in several studies is use of the Grey Level Co-occurrence Matrix (GLCM) method to extract the second-order texture features (Lianantonakis & Petillot, 2007;Harrison et al., 2011;Hamilton, 2015;Buscombe, 2017;Hamill et al., 2018).Blondel et al., (1998) demonstrated GLCM s could be successfully used to segment SSS images.
A recent focus topic has been side-scan sonar image clustering, segmentation, and classification.One of the most comprehensive literatures regarding sonar image processing for detection and classification of man-made objects can be found in (Dura, 2011).Many researchers have tried to adopt different methods of detection, segmentation and classification in sidescan sonar imaging (Nelson & Krylov, 2014;Buscombe et al., 2016;Vikas, 2017).In addition, the development of computer vision and machine learning technology including Support Vector Machine (SVM) and Decision T ree (DT ) classifications has influenced the applications for underwater acoustic remote sensing technologies (e.g.multibeam echosounder and sidescan sonar).
Although the two machine learning classification methods of SVM and DT have been applied in several studies, both machine learning methods can be found more frequently in studies of multibeam echosounder classification such as in Ierodiaconou et al., (2007), Hasan et al., (2012); Grilli & Sh umchenia, (2015), and Prospere et al., (2016).The use of Decision T ree classification in sonar imaging can be found in Doherty et al. (1989) and the studies using SVM can be found in Junior & Seixas (2015), Rhinelander (2016), and Yang et al. (2016).
The existence of this research limitation provides an opportunity for new research into the application of SVM an d DT classifications using combination first and second order texture features in SSS imaging.T hus, this study tries to undertake texture analysis and feature extraction of SSS imagery, and feature-based classification of SSS imagery usin g a Support Vector Machine and Decision T ree classification method.T he popular convolutional neural networks (CNN) have not been used for this research due to the limited training samples available.
Figure 1.Map of study area (image taken from Google Earth).

STUDY AREA AND DATA
The SSS raw data have been acquired by Curtin University's Centre for Marine Science and T echnology (CMST ) in February 2017 (Parnum, 2018) in the upper area of the Swan River, Perth, Western Australia (Figure 1).
The study area covers approximately 16,032 m 2 of this area and contains many submerged trees and riverbed ripples, which makes this area suitable for this image classification study.In addition, this area is highly turbid, so it is not possible to use optical imaging techniques.T he shallow nature of the upper Swan River (<10 m), makes SSS a more suitable for use than other acoustic methods, as it provided good coverage.Data were collected using an Edgetech 4125 dual frequency SSS, operating at 400 and 900 kHz.The across-track beamwidth of both transducers was 50° with a 33° tilt (Edgetech, 2014), which means it is not directly imaging below the tow-fish.Although both frequencies were recorded, only the 900 kHz data was used in the final mosaic, as it provided the highest image resolution of the two channels.Positioning was provided by a Hemisphere VS330 receiver with an A41 antenna receiving satellite based differential corrections (MarineStar).Data was recorded in the EdgeT ech Discover 4125D version 36.0.1.120and logged as Edgetech data (.jsf) files.
The pre-processing of the SSS data was carried out similar to the methods of Parnum & Gavrilov (2009).Radiometric corrections compensate the backscatter data for transmission loss (absorption and spreading) and insonification area (Blondel, 2009).Geometric corrections relate to the calculation of the X-Y position for each data point as follows: 1. Correcting position for offset between GPS antenna and location of the tow-fish 2. Smoothing GPS data with a Kalman filter 3. Picking the range to the seafloor 4. Calculating the across track distance for each sample accounting for tilt of SSS beam (assuming a flat bottom).5. Correct position for heading, pitch and roll, which are recorded from sensors in the tow-fish Figure 2 depicts one example of starboard sides of the SSS mosaics used in this study.The image contains several riverbed features such as tree branches and changes in riverbed topography.As the transducers are tilted 20° from the vertical, there is no data collected directly below the tow-fish.Hence, there is a gap in the middle of the image.T he image also has acoustic noise outside the riverbank, shown on the starboard side in Figure 2, which needs to be removed by masking it out using a polygon.

Feature Extraction and Textural Analysis
In SSS images, different types of seabed are represented by textured regions (Lianantonakis & Petillot, 2007;Shang & Brown, 1992).For this study, we have used first -order and second-order textures.First-order texture analysis uses a statistical analysis of the grey value, or digital number (DN), in its calculations.T he second-order texture analysis method is described by the grey level co-occurrence matrix (GLCM) (Haralick et al., 1973).In this study, the textural analysis was performed by deriving nine textural features via use of the moving window method.T he textural features consisted of five first-order textures (Standard Deviation, Mean, Skewness, Kurtosis and Range) and four second-order GLCM textures (Homogeneity, Contrast, Correlation and Energy).
Figure 2. SSS mosaic and some riverbed substrates used in the textural and histogram analysis.
It was found that setting the direction (angle) for the GLCM calculations to an angle of "0" created sufficient results for the calculations.It meant that the calculation would only consider its horizontally adjacent pixel to the direct right of each pixel.
The window dimension chosen to calculate these textural features plays a crucial role in producing the resulting image (Warner, 2011).T he selection of window dimension should consider the scale of the textures or objects to be classified (Zhang, 2001).A small window size will maintain the edges of features but produce a less stable texture and plenty of mess, whereas a large windo w size will provide sufficient texture features but obscure the edges (Warner, 2011;Hamilton, 2015).
Although for remote sensing, in general, a small window size (3 × 3, 5 × 5 or 7 × 7) is used (Zhang, 2001;Jensen, 2014), some studies in un der water applications have used larger windo w sizes (e.g.19 x 19 pixels) (Hamilton, 2015).For our method several different windo w sizes (3, 5, 9 and 15) were used to examine the most appropriate size for the textural analysis.

Classification
This study uses three classes: 'flat', 'ripple' and 'tree', as identified by an expert.In this study, "flat" is defined an area of flat riverbed with no epi-benthic structure (e.g.trees) present; " ripple" is an area of topographic change in the riverbed (e.g.sand waves, scour, a rise up to a riverbank) but with no epibenthic structure; and, "tree" is the presence of epi-benthic structure (e.g.tree branches).T he training data images were obtained by masking and clipping the mosaic image.

3.2.1
Support Vector Machine (SVM) SVM is a binary classifier that aims to separate data into classes by fitting an optimal hyperplane in the feature space separating all training data points from the two classes (Foody & Mathur, 2006;Liu et al., 2015).Data points that are located close to the hyperplane are used to define the hyperplane (Foody & Mathur, 2006).While a number of hyperplanes will be able to separate the classes, the most optimal hyperplane will be chosen with the maximum margin (Foody & Mathur, 2006).When the training data are not dissociable linearly, the feature space is transformed into a higher feature space until the two classes become linearly separate.T his transformation is not required explicitly as the so-called 'kernel trick' is applied.The Gaussian kernel can be chosen as an alternative to the linear kernel and is suitable for the non-linear decision surface (Foody & Mathur, 2006).T hus, additional parameters in Gaussian kernel, the magnitude of C and parameter γ (indicating the smoothness of this kernel) have to be trained.In the training phase, the cross-validation is used to reduce the subjectivity setting in SVM parameters such as the magnitude of C and γ (gamma) and to stop a SVM from overfitting (Foody & Mathur, 2006;Hsu et al., 2008).T uning of hyperparameters in SVM is a crucial step that can have a significant effect and influence the classification (prediction) accuracy (Bergstra et al., 2015;Duarte & Wainer, 2017).Hyperparameter optimisation is used to find the best configuration variables for a training algorithm so that the algorithm can obtain an optimal result (Bergstra et al., 2015).
Afterwards, when applying the model to a new set of data that has to be classified (e.g. an image), the SVM only checks on which site of the hyperplane the new data sample is located, based on the predictors (features) that are used.

3.2.2
Decision T ree (DT ) A Decision T ree predicts or classifies the classes based on the predictor or training data.Initially, an adequate number of training data needs to be provided and constructed in a vector in which each entry represents an attribute.Those attributes represented a class (Loh & Shih, 1997;T ehrany et al., 2013;Jensen, 2014).Afterwards, the data is split into subsets (consisting of leaves (representing classes), nodes (representing the attribute of the classified data) and arcs (representing alternative attribute values)) based on the attributes, parameters or criteria that were used.A Decision T ree examines the data from the top node or root node and splits data based on two attributes (Loh & Shih, 1997).T he growing and defining of the trees can be controlled using several parameters, such as the maximal number of decision splits including branch nodes, the minimal number of leaf nodes in the classifier, the minimal number of branch nodes in the classifier, the approach to split the data, and the number of predictors to select at random for each split.Decision tree also applies hyperparameters to define the internal nodes of the tree, define the criteria of splitting the branches, and the optimising the tree during the training (Bergstra et al., 2015).T he DT algorithm usually also offers to use manual or auto (all) optimise hyperparameter methods.After all of the criteria have been achieved in splitting the data samples and the full tree has been reached, this information is saved in the DT model.The cross-validation method was used to evaluate the model's performance in making predictions in the classifications and to avoid overfitting of the model.T he classification error score was calculated to examine the quality of the training.
After the training when applying the model to a new set of data, the DT will check the data based on each criterion and assign the data to a class based on the criterion.

3.2.3
Post-classification Filtering In pixel-based classification, it is common that the classified image will contain a large amount of noise and speckle, commonly called "salt and pepper noise" (Haack, 2007).In order to remove such noise, a median filter (low-pass filter) was employed, as this filter has been demonstrated to be effective in eliminating the speckles and smoothing the image.Although it can produce more blur, this filter generally will maintain the edge pixels of the image and preserve important textural information.Due to the results of previous studies (Herold & Haack, 2006;Haack, 2007;Wesselink et al., 2017), the 5 x 5 win dow size of the Median filter was chosen for the final mosaic classification.

3.2.4
Accuracy Assessment In order to perform the accuracy assessment using error matrices, referenced or ground-truth points are required so that they can be compared with the approach's classified points (pixels) (Lillesand et al., 2015;Miandad, 2018).Expert knowledge was utilised to extract the required reference data as in field date could not be collected.A point file was created in which a number of well-distributed reference points (pixels) together with their class labels are defined (27 samples for the " flat", 36 samples for the "ripple", and 34 samples for the "tree" class).. Afterwards, the pixel values of the classified image associated to the location of each reference point, were extracted and used to verify the reference point at this location.Based on this information, the accuracy assessment metrics including user's accuracy, producer's accuracy, overall accuracy and kappa coefficient were calculated.

Textural Features
Different moving win dow sizes were examined (7x7, 11x11, 19x19 and 31x31 pixels) to create the texture images and results show that a 19x19 pixels of windo w produces the best result for the texture analysis.After the texture images had been created, a histogram analysis was performed to investigate the most appropriate texture variables that could be used to distinguish between each object in the image (i.e. with an apparent gap between each object's histogram).The choice of textural variables used in the classification was based on how well the variables could distinguish between the objects' classes.
Results indicate that the Standard Deviation, Mean, Skewness, Range, Homogeneity, Contrast and Correlation variables are able to clearly separate between 'tree' and 'ripple' in their histograms.T he 'ripple' feature tends to have a similar histogram to the 'tree' feature.Thus, finding variables that are suitable for distinguishing bet ween 'tree' and 'ripple' features is more challenging.In addition to the texture selection, the predictor importance estimation (Bolon-Canedo and Alonso-Betanzos, 2018) also was conducted to obtain the most suitable texture variables for the classification.T he predictor importance computes the estimation of important variables that contribute to the classifier.T he result of predictor importance estimation shows that only three textures of those features can make a significant impact upon the classification.T he Correlation, Standard Deviation and Kurtosis textures have the highest predictor rates for classifying the three classes, with 0.0727, 0.0424 and 0.0166 respectively.

Support Vector Machine Classification
The graphs in Figure 4 sho w training data plots of test data samples (coloured dots) and the support vectors of each class (coloured circles).Figure 4a     The second trial strategy attempted to examine the use of a Gaussian kernel for the SVM classification, using the textural features of combination 6. (Figure 6).It can be noticed that linear kernel performs better, visually, than Gaussian kernel.The optimised hyperparameter function does not seem to have a significant effect upon the classification using linear kernel (Figure 6(c)).Similarly, optimised hyperparameter does not provide an opt imal result to the classification using Gaussian kernel and produces some clutter in the image.

Decision Tree Classification
Next, a Decision T ree (DT ) classification was performed using the features of combination 6 (Figure 7).It can be seen that the use of both auto-optimised hyperparameter and all-optimised hyperparameter options in the classification (Figure 7(c) and 7(d)) produces similar results and has no significant effect on the classification, possibly due to the small number of splits (branches) of the tree used in the classification In general, the results show that the DT classifier performs better when classifying the flat area.However, there are misclassifications of this model when separating 'ripple' and 'tree' classes.

Comparison of classification results
The final results of classification after post -classification filtering can be seen in Figure 8, which shows the results of the SVM using linear kernel, and Decision T ree.It can be seen that the SVM classifiers produce a better visual result than Decision T ree.In the DT result, a huge number of 'ripple' areas were classified as 'tree'.There are some speckles of 'tree' class in the 'flat' area in the SVM classification results, which can lead to misclassification of the 'flat' area.
In addition to the visual inspection, a quantitative assessment was performed using an error matrices.T he three classifiers (SVM linear kernel, SVM Gaussian kernel and Decision T rees) using three selected texture features of combination 6 (Correlation, Standard Deviation and Kurtosis) were compared (T able 2).In general, all classifiers performed quite well in predicting all three classes.It can be seen from T able 2 that the SVM using linear kernel has the highest overall accuracy (77%), followed by Decision Tree and SVM using Gaussian kernel (73% and 60%, respectively).T he kappa coefficient also shows a similar trend.T he highest kappa coefficient was achieved using a SVM classification with a linear kernel (0.62) followed by the Decision Tree and SVM classifier usin g Gaussian kernel (0.59 and 0.39, respectively).The kappa coefficients for all of the classifiers show a moderate agreement rate between the classified classes and the reference classes.However, the SVM using linear kernel has the best performance among all classifiers.T wo SVM classifiers, one using a linear kernel and one using a Gaussian kernel, along with a DT classifier were performed using selected textural features.Accuracy assessment of all classifiers showed that SVM using a linear kernel achieved the best overall accuracy and kappa coefficient (77% and 0.62), followed by Decision Tree (73% and 0.59) and SVM using a Gaussian kernel (60% and 0.39).Although all classifiers performed well in classifying the 'flat' area, all classifiers performed at a lower accuracy rate in classifying the 'tree' class than the 'ripple' class.In addition, SVM using a linear kernel also indicated a relatively stable performance when classifying all classes.Conversely, the performance of Decision T ree showed fluctuations in both producer's accuracy and user's accuracy, notably showing an increasing percentage in detecting 'ripple' for user accuracy and a decreasing percentage in classifying the 'tree class'.Additionally, SVM using a Gaussian kernel seems not to be suitable for use in this classification and produces the lowest accuracy among all classifiers, particularly in predicting 'ripple' and 'tree' classes.This study also revealed that SVM has a more superior ability to be used with small number s of training data.However, a more creditable result could be achieved by applying the same methods in a different study site and by obtaining the training samples and ground reference samples directly in that study site.
Based on the challenging environment in future research we aim to focus on using advanced textural features.For instance, we want to investigate if the first layers of a deep architecture (such as convolutional neural networks) will generate feature which improve the classification results when applying a SVM classifier.
and b show plots using the linear kernel combined with Correlation/Standard Deviation (a) and Correlation/Kurtosis texture features (b), the same features have been used in (c) and (d) but using a Gaussian kernel.For all graphs, it can be seen that only the 'flat' class (red points) have a clear distinction from other classes.

Figure 4 .
Figure 4. Feature plots using a linear kernel with Correlation/Standard Dev.(a) and Correlation/Kurtosis features (b), and (c)/(d) using the same features with a Gaussian kernel.
able 1: Overview of the five texture feature combinations (C1-C6) used during the classification.The result of textural variable combinations in SVM classification (Figure 5) are examined visually.It can be seen that not all of the texture variables can produce an optimal result in the SVM classification.T he first experiment (Figure 5(b)) shows that the use of all variables could not give a satisfactory visually classified image, with a huge misclassification of 'flat', 'ripple' and 'tree' classes.Similar results are shown in the second and third experiments (Figures 5(c) and 5(d)) which produced misclassification results, particularly for the 'flat' class.In contrast, the fifth and sixth experiments (Figures 5(e) and 5(f)) show the most optimal results of texture variable selection for the SVM classification.Although the texture variables used in the latest experiment (combination 5) produced a good visually classified image, the classified image using only three variables (Correlation, Standard Deviation and Kurtosiscombination 6) shows a better and smoother result.For that reason, these three variables were chosen for the SVM classification and Decision T ree classification.