ASSESSMENT OF MULTIBEAM BACKSCATTER TEXTURE ANALYSIS FOR SEAFLOOR SEDIMENT CLASSIFICATION

Recently, there have been many debates to analyse backscatter data from multibeam echosounder system (MBES) for seafloor classifications. Among them, two common methods have been used lately for seafloor classification; (1) signal-based classification method which using Angular Range Analysis (ARA) and Image-based texture classification method which based on derived Grey Level Co-occurrence Matrices (GLCMs). Although ARA method could predict sediment types, its low spatial resolution limits its use with high spatial resolution dataset. Texture layers from GLCM on the other hand does not predict sediment types, but its high spatial resolution can be useful for image analysis. The objectives of this study are; (1) to investigate the correlation between MBES derived backscatter mosaic textures with seafloor sediment type derived from ARA method, and (2) to identify which GLCM texture layers have high similarities with sediment classification map derived from signal-based classification method. The study area was located at Tawau, covers an area of 4.7km2, situated off the channel in the Celebes Sea between Nunukan Island and Sebatik Island, East Malaysia. First, GLCM layers were derived from backscatter mosaic while sediment types (i.e. sediment map with classes) was also constructed using ARA method. Secondly, Principal Component Analysis (PCA) was used determine which GLCM layers contribute most to the variance (i.e. important layers). Finally, K-Means clustering algorithm was applied to the important GLCM layers and the results were compared with classes from ARA. From the results, PCA has identified that GLCM layers of Correlation, Entropy, Contrast and Mean contributed to the 98.77% of total variance. Among these layers, GLCM Mean showed a good agreement with sediment classes from ARA sediment map. This study has demonstrated different texture layers have different characterisation factors for sediment classification and proper analysis is needed before using these layers with any classification


INTRODUCTION
Analysis and determination of physical properties of the seafloor is a crucial element for important marine activities, including coral reef management, fisheries habitat management and marine geology studies (Hedley et al., 2016;Buhl-Mortensen et al., 2015;Robidoux et al., 2008;Hughes Clarke et al., 1996).Over the last decades, the rapid developments in marine acoustic survey methods have revolutionised the formation of detailed maps of seafloor for the purpose of seabed habitat mapping (Brown et al., 2011b).The use of highresolution acoustic technique, in particular multibeam echosounder system (MBES) in providing full coverage topography (i.e.bathymetry) and acoustic backscatter (i.e.intensity returns) is vital for sediment and habitat types prediction (De Falco et al., 2010;Medialdea et al., 2008;Sutherland et al., 2007).Backscatter returns from MBES is one of the potential dataset from acoustic technique that is seen to consist of important acoustic scattering information of the sediment types and offers huge possibility of remote identification of seafloor as well as proxy for habitat classes.
For sediment classification using backscatter from MBES, image analysis such as the use of image textural analysis is probably the most widely used technique in many studies (Herkül et al., 2017;Lucieer et al., 2016;Blondel et al., 2015;Zhi et al., 2014;Che Hasan, 2014;Hill et al., 2014;Siwabessy et al., 2013;Lucieer et al., 2013;Fakiris et al., 2012;Micallef et al., 2012;Huang et al., 2012;Lucieer et al., 2011;Díaz, 2000).The technique, known as Grey Level Co-Occurrence (GLCM) method originated from textural analysis method of radar image using Haralick textures (Haralick et al., 1973).As many texture layers can be derived from one image (in this case backscatter image), it is important to perform a detail assessment of which texture layers represent sediment classes.This is important because many habitat mapping process such as classification technique requires high spatial resolution data that can be incorporated with high spatial resolution bathymetry maps.
As backscatter data can be also represented by backscatter as a function of incidence angle, some studies have also used angular backscatter intensity (also known as signal-based backscatter) as one of the techniques to extract scattering information (Monteys et al., 2016;Huang et al., 2013;Lamarche et al., 2011;Fonseca et al., 2009;Parnum, 2007).Compared to backscatter image or mosaic, signal-based backscatter from MBES does not have high spatial resolution as the mosaic and thus might not be difficult to be integrated with other high spatial resolution maps such as bathymetry.However, one of the classification methods for signal-based backscatter, known as Angular Range Analysis (ARA) has been developed to automatically predict seafloor sediment types using acoustic inversion process (Fonseca and Mayer, 2007).texture with seafloor sediment type, and (2) to identify which GLCM texture layers (i.e. from the image-based method) produce sediment classification map that have the highest similarities with signal-based classification method.Figure 2 shown the overall methodology flow chart conducted for this study.

Study area
The study site is located in Tawau, Sabah, Malaysia which covers an area of 4.7km 2 .It is situated off the channel in the Celebes Sea between Nunukan Island and Sebatik Island, East Malaysia (Figure 1).The site is adjacent to the international maritime border between Malaysia and Indonesia, located at about 1.5 km northwest of the Nunukan Island and 2.0 km southwest of the Sebatik Island.

Acoustic data acquisition
Acoustic data from MBES were acquired on the 26th of November 2017 until 1st of December 2016 using a hull-mounted Kongsberg EM2040c multibeam bathymetric system.With a swath of seafloor ensonified four to five times the water depth on each survey line, offset for line spacing is set to three times water depth in order to provide ensonification overlap between adjacent survey lines.The positioning of the vessel during the survey was achieved by using C-Nav3050 DGPS system (horizontal accuracy: ± 0.45 m + 3 ppm and vertical accuracy: ± 0.90 m + 3 ppm) (Dubilier, 2016), integrated with an Kongsberg Seatex Motion Reference Unit MRU-5 (roll and pitch accuracy: 0.02° RMS at a ±5° amplitude) (Kongsberg, 2016), for roll, pitch and heave corrections.Multibeam data logging, Real-time navigation, display and quality control were using Seafloor Information System (SIS) software version 4.2.1 provided by Kongsberg.A sound velocity profile (SVP) through the water column in the survey area were daily collected in the beginning and at the end of survey process using Valeport Midas SVX2 equipment in order to obtain the speed of sound propagation in the water column at the survey area.

Acoustic data processing
Backscatter data can be divided into two formats which are; (1) signal based data or backscatter intensity as a function of incidence angle, and (2) image-based data (i.e.backscatter mosaic).As a result, different classification methods have been established for each dataset (Brown et al., 2011a).The raw MBES backscatter data were processed in Fledermaus Geocoder Toolbox software version 7.4 (FMGT) to obtain (a) backscatter mosaic, and (b) prediction of sediment types using Angular Range Analysis (ARA) technique (Fonseca and Mayer 2007).An automated FMGT processing procedure was applied for both types of backscatter data (Quas et al., 2017).The corrections such as signal level adjustments and transmission loss, beam incidence angle, adjustments of beam footprint area and, Lambertian scattering adjustments were applied for each raw backscatter time series beam (QPS, 2016).Next, the backscatter intensity data were filtered based on beam angle , and then an anti-aliasing pass was run on the resulting backscatter swath data (QPS, 2016).For signal based seafloor classification, ARA technique was applied to the angular backscatter intensity to predict sediment types.This process produced estimated bottom sediment map by comparing the angular response/impedance estimates from calibrated backscatter values to empirical sediment models (Fonseca and Mayer 2007).The resulting seafloor characterisation map or ARA map was then exported to raster format for subsequent process.Note that in general, the spatial resolution from ARA map is low, with the size is half of the MBES swath width.Default ARA map yielded 30 sediment classes but then were regrouped to four (4) major sediment classes; sand, silt, clay, and gravel (Figure 3) as these classes are the dominant sediment types in ARA map.A set of random point was then generated from ARA map to be used as ground truth point.Along with ARA map, a backscatter mosaic image was produced at 1m pixel resolution for further analysis.In this study, sampling of ground truth was not available and therefore classes from ARA map were used as known classes to compare results with the classification map produced from texture layers of backscatter mosaic.

Derived GLCM and Image Statistics
Texture from image is an important characteristic for image classification such as used in many terrestrial remote sensing image processing and analysis.Eight (8) Haralick texture layers (Haralick et al., 1973) were derived from backscatter mosaic using ENVI 4.8 software; mean, variance, contrast, correlation, homogeneity, dissimilarity, entropy and angular second moment based on previous literature studies (Herkül et al., 2017;Diesing et al., 2016;Blondel et al., 2015;Lucieer et al., 2013;Huang et al., 2012;Lucieer et al., 2011).All texture layers were derived using Grey Level Co-occurrence Matrix (GLCM) method.GLCM calculates statistics by determining distinctive textural properties from an acoustic image showing the relationships between a given pixel and a specific neighbor (Díaz, 2000).For this study, Haralick texture layers were derived from GLCM calculated for a local rectangular window of 3x3 pixels.

Principal Component Analysis (PCA)
Principal Component Analysis (PCA) method has been widely used in the previous study for data reduction and to avoid multicollinearity of the abiotic variables prior to clustering process (Ismail, 2016;Che Hasan, 2014;Verfaillie et al., 2009;Robidoux et al., 2008;Díaz, 2000).PCA has also been used to recognise which textural layers contributing most to the clustering map.PCA computes a set of new and linearly independent variables known as principal components (PCs) that account for most of the variance of the original variables.
The PCs are produced from a linear combination of the original variables.PCA was used to determine; (a) which texture layers have the most contributions to the total variance of each rotated PC, and (b) correlations between different texture layers with each PC.Results from this will give a broad idea of which layers are important.

Clustering and comparison
After important texture layers have been identified, a K-Means clustering algorithm was applied to these texture layers.The K-Means clustering technique is widely being used for data segregating for terrestrial remote sensing and also in the marine environment.For this clustering process, the number of the cluster was set to be equal to the number of sediment classes in ARA map (i.e.four classes).A set of 148 points from ARA map were generated by creating random points inside the study area.
Cross-tabulations of ground truth and clustered data for four PCs variables resulted from PCA were conducted to compare the occupancy of sediment within each cluster groups.

Results
The results from PCA analysis produced four (4) PCs, explaining 98.77% of the total variance.The rotated component matrix (Table 1 and Table 2) shows the correlations between the rotated PCs and the original variables.The main GLCM variables that contributed to the highest variance of the PCA are Correlation (PCA1 -0.49%), Entropy (PCA1 -0.49%), Contrast (PCA2 0.57%), and Mean (PCA3 0.87%).This four GLCM layer obtained from PCA will be used for next cross tabulation analysis.The results from clustering map showed that, for GLCM Correlation and Entropy layers (Figures 4 and 5), the cluster map only showed two dominant classes.For the Contrast layer, the clustering was also showing poor cluster boundary although successfully produced four classes (Figure 5).Only clustering results from GLCM Mean layer showed cluster map with four classes and well delineated class boundary (Figures 6 and 7).For GLCM Entropy, only cluster 2 and cluster 4 have strong associations with a specified sediment class.For example, 100% of cluster 2 was related to Gravel and 67% of cluster 4 was identified as sand.For GLCM Contrast layer, three different clusters (clusters 2, 3 and 4) in the map showed high agreements with a single sediment type (i.e.sand), at 67%, 74% and 63% respectively.GLCM Correlation cluster map has identified two clusters with two different sediment types; cluster 1 with silt (67%) and cluster 4 with sand (65%).

PCA Layer
However, for GLCM Mean, each cluster was showing relation to a unique sediment type, although there were some small percentage of other sediment types.For example, cluster 1 with gravel (42%), cluster 2 with silt (83%), cluster 3 with clay (43%) and cluster 4 with sand (74%).

DISCUSSION
The approach of this study is to identify the correlation between MBES derived backscatter mosaic texture with seafloor sediment type and to identify the capability of texture based method to differentiate seafloor sediment classes.The research used sediment classes from ARA as substitute for ground truth and subsequently a set of random ground truth point was generated inside the study area.From the result obtained, it can be clearly seen that only clustering from GLCM Mean layer can provide significant discrimination compare with others three GLCM layers.Previous studies on texture-based sediment classification techniques have shown that the indices 'Mean' capture most of the textural variability within the data (Huvenne et al., 2007).Mean from backscatter has also been used in some of the sediment classification (Hill et al., 2014;Lucieer et al., 2013;Díaz, 2000).
The results from PCA analysis is able to identify some of the important layers in general.Principal component analysis has been broadly used to recognise which textural layers contributing most to the clustering map (Ismail et al., 2015;Che Hasan, 2014;Verfaillie et al., 2009;Robidoux et al., 2008;Díaz, 2000).GLCM Correlation, Entropy, Contrast and Mean are the main texture layer resulted from principal component analysis with percentage eigenvalue more than 1%.However, in this study, the PCA results of identifying the most significant layer disagree with the clustering map analysis.For example, PCA identified GLCM layers of Correlation and Entropy as the most influenced layers (PCA1), but the clustering map analysis has identified different layers (i.e.GLCM Mean).This is due to the small ratio of clay and silt within the study area.According to Che Hasan (2014); Müller and Eagles (2007), different sediment proportion within a sediment class also may cause backscatter intensity and texture analysis to diverge and unsupervised classification methods do not allow the control of such factors.
The study identified some relationships between the MBES backscatter mosaic and resulting clusters map with the backscatter derivatives GLCM Mean.Although GLCM Mean at the fourth place in the sequence of most contributing GLCM layer, previous researchers (Che Hasan, 2014) suggested that GLCM mean demonstrates the most significant layer for sediment clustering map.

CONCLUSION
A total of 4.7km 2 of multibeam sonar backscatter data from Tawau coastal area, Malaysia, was classified using GLCM and K-Means algorithms to find correlations between signal and image based backscatter.Notably, our approach is only using random ground truth point created in GIS software due to limitation during the survey.Hypothetically, if the ground-truth point of the survey had been carried out on a targeted K-Means clustering, the agreement observed may have been more convincing.However, on the basis of the comparisons with randomly created ground-truth data, the cross tabulation analysis conducted has shown encouraging results.In summary, only GLCM Mean texture layer show the significant similarities with signal based sediment classification map and demonstrate the ability to successfully delineating the major type of sediment.Overall, it can be concluded that image-based backscatter classification can assist the interpretation of multibeam backscatter data for the production of sediment maps.

Figure 3 :
Figure 3: Sediment classes produced using Angular Range Analysis (ARA) and used for ground truthing Cross tabulation between the GLCM Entropy cluster map and ground truth observations

Figure 8 :
Figure 8: Per cluster sediment composition percentage for GLCM Correlation texture layer

Table 2 :
Component matrix showing a correlation between rotated PCs and the original variables.Highest factor loads in each PC are highlighted in bold

Table 5 :
Cross tabulation between the GLCM Correlation cluster map and ground truth observations

Table 6 :
Cross tabulation between the GLCM Mean cluster map and ground truth observations