HOW CHALLENGING IS THE DISCRIMINATION OF FLOATING MATERIALS ON THE SEA SURFACE USING HIGH RESOLUTION MULTISPECTRAL SATELLITE DATA?

Developing a remote sensing-based monitoring system for detecting marine plastics requires a thorough investigation of their discrimination possibilities from other floating objects. To this end, this study aims to explore the ability to discriminate marine debris from other floating materials and sea features using high-resolution multispectral satellite data. To perform our analysis, we utilized the open-access Marine Debris Archive (MARIDA), which contains several marine classes on Sentinel-2 (S2) data and benchmarks machine learning frameworks. In particular, we investigated well-established spectral indices, GrayLevel Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP) texture and other spatial features at multiple scales. A Random Forest (RF) classifier was also applied for the classification procedure, and the spectral and spatial features which contributed the most were underlined. The quantitative and qualitative assessment indicated that the spectral information alone is insufficient to distinguish marine plastic from other floating materials which exhibit similar spectral behavior, such as vessels. However, a strong potential even for challenging discrimination tasks is presented when combined with spatial information. By further evaluating our results qualitatively, significant insights are gained, and specific combinations are proposed for challenging floating materials discrimination.


INTRODUCTION
Marine debris pollution is one of the most growing issues on a global scale, which endangers human health, marine life and maritime safety. Plastics infiltrate the marine environment through diverse land-based and sea-based activities. Additionally, plastics' fate in the marine environment is not predetermined but is influenced by a variety of factors such as the characteristics of the plastics themselves (size, shape, density), climatic conditions (precipitation, air intensity, temperature, solar radiation), sea currents and waves, and biological interference (Galgani et al., 2015;Van Sebille et al., 2020).
Various studies on investigating marine debris behavior have been conducted in recent years. These studies involve the development of models for estimating the volume and the source of plastics in the marine environment (Lebreton et al., 2017) and the tracking and sampling of debris accumulation to get a better knowledge of their behavior (Ruiz-Orejón et al., 2016). In addition to terrestrial methods, airborne and satellite remote sensing methods have also recently been employed to detect marine plastic concentrations focusing on coastal areas. However, detecting marine debris is challenging due to the inherent properties of plastics, the complexity of the marine processes and the variability of the sea conditions (Maximenko et al., 2019). Monitoring marine debris and discriminating from other floating materials is considered a complex and demanding problem requiring advanced techniques and improved satellite sensors (Martínez-Vicente et al., 2019).
To address the challenging task of floating materials (e.g., macroalgae species, plastic debris) detection, most remote sensing studies conducted on multispectral data have been focused on spectral patterns analysis (Hu et al., 2015; Acuña-Ruz et al., * Corresponding author 2018; Qi and Hu, 2021). For instance, to discriminate marine debris and Sargassum macroalgae, Kikaki et al. (2020) utilized observations from satellite imagery (Planet, S2, Landsat-8) verified by in situ data, indicating that marine plastics can be spectrally differentiated from dense Sargassum macroalgae. Other studies have investigated the spectral discrimination of marine debris from different floating features by employing spectral indices or pure spectroscopy. Most of these experiments have been performed in a controlled or simulated environment enhancing efforts in marine debris detection (Biermann et al., 2020;Topouzelis et al., 2020;Themistocleous et al., 2020;Hu, 2021).
Additionally, machine learning methods have recently gained recognition for plastics identification. Generative Adversarial Network (GAN), as well as shallow Support Vector Machine (SVM) and Random Forest (RF) models, have been trained using artificial plastic targets for binary classification tasks (debris/ non-debris) (Jamali and Mahdianpari, 2021). Basu et al. (2021) have investigated the performance of unsupervised and semi-supervised algorithms in identifying the presence or absence of plastics, highlighting the need to validate the performance of the classification models at a global scale. Segmentation architectures such as U-Net and DeepLabV3+ have also been successfully used in Solé Gómez et al. (2022) to predict three classes, i.e., debris, water and other, on S2 data in rivers. To discriminate floating objects from non-floating objects on the sea surface, Mifdal et al. (2021) relied on Convolutional Neural Networks (CNNs), suggesting that these models are able to predict the spatial characteristics of the annotated floating features correctly.
In this direction, this study takes a step forward. It investigates the contribution of both spectral and spatial features to floating marine debris separation from other materials on multispectral satellite data. The current study concentrates mostly on challen-ging cases, i.e., distinguishing features with similar spectral signatures and/or spatial patterns. To accomplish that, we examine the ability of spectral indices, texture features GLCM (Haralick et al., 1973), Local Binary Patterns (Ojala et al., 2002), and other characteristics that offer information on spatial patterns (Gaussian, Sobel, Hessian Eigenvalues), extracting significant insights about competing classes. To perform our analysis, we utilize the benchmark Marine Debris Archive (MAR-IDA) (Kikaki et al., 2022); a recently published open-access dataset that contains observations of marine debris, floating objects and water types on S2 data. A Random Forest classifier was also applied, and a feature selection methodology was performed to understand the underlying properties of the studied spectral and spatial features that contribute to the classification process. Finally, a quantitative and qualitative assessment of our approach is performed, whereas insights and challenges are discussed in detail.

THE MARIDA DATASET
Recently, Marine Debris Archive (MARIDA) has been introduced, aiming to fill the gap regarding the limited availability of open-access multispectral datasets and benchmarks for marine debris detection. To construct MARIDA, the imageinterpretation experts annotated the S2 images taking into account in situ reports, very high-resolution satellite images, weather data, and considering the spectral and spatial patterns of the studied features. As a result, the dataset provides georeferenced polygons in shapefile format, annotated masks (i.e., patches) ready for machine learning tasks, as well as machine learning baselines for the challenging task of floating materials classification on the sea surface. Further details about MAR-IDA dataset can be found at marine-debris.github.io.  In this study, we focused on the major competing cases for marine debris detection on the sea surface, as indicated by Kikaki et al. (2022). For this reason, we investigate the discrimination between Marine Debris, Sargassum macroalgae, Natural Organic Material, Ship and Foam.

METHODOLOGY
This section presents our workflow, including the spectral and spatial features that were performed along with the Random Forest classifier which was applied for the classification procedure.

Spectral Indices and Spatial Features
Spectral information is necessary to differentiate the various sea surface features on multispectral satellite data. Kikaki et al. (2022) proposed a set of spectral indices used in MARIDA classification baselines. To further investigate the degree of sea surface feature separation, we used a more extensive set of spectral indices. These indices were chosen to enhance the spectral differences of competing classes.
More specifically, we used the following spectral indices: i) Floating Debris Index for marine debris detection (FDI) (Biermann et al., 2020) ii) Normalised Difference Vegetation Index (NDVI), Floating Algae Index (FAI) (Hu, 2009), Near Infrared-Red Difference (NRD) (Hu, 2021) and Enhanced Vegetation Index (EVI) for vegetation and Sargassum macroalgae mapping iii) Normalised Difference Water Index (NDWI), Normalised Moisture Index (NDMI), Modified Normalised Water Index (MNDWI) and Automated Water Extraction Index (AWEI) for water-features extraction, and iv) Normalised Difference Snow and Ice Index (NDSII), Bare Soil Index (BSI) and Shadow Index (SI) to highlight bright or other objects.
Highlighting spectral properties via spectral indices is essential for detecting and classifying materials and features on the sea surface. However, in challenging cases, this information alone seems to be insufficient. For this reason, following MARIDA analysis, we propose the combination of spectral information with spatial indices that utilize information about the texture (i.e., GLCM, LBP), as well as edges, corners, and flat image regions (i.e., Gaussian, Sobel, Hessian Eigenvalues). It should be noted that this combination has not been widely investigated in the literature for this challenging task. Kikaki et al. (2022) proposed a classification baseline in which GLCM features are used to provide texture information. We experimented with the Contrast (CON), the Correlation (COR), the Homogeneity (HOMO), the DIS (Dissimilarity), the ASM (Angular Second Moment) and ENER (Energy) (Hall-Beyer, 2017). These features are computed on a quantized version of a single-band image. In this work, we used grayscale images derived by RGB composites (Robinson et al., 2021) and quantized them in 16 bins level. To extract a GLCM feature for a specific region (window around a pixel), the associated GLCM matrix has to be computed first. In our case, a GLCM is a 16 x 16 (defined by the number of bins) matrix containing the probability of each pixel value i co-occurring with a pixel value j, for defined distance offsets inside the selected window (we used a sliding 13 x 13 window). Finally, the GLCM is multiplied by a weight factor which depends on the selected texture feature.
Although GLCMs provide useful spatial information for distinguishing sea surface features, they are computationally expensive, especially when calculated for multiple scales. To overcome this obstacle and include the scale information into the process while ensuring low computational costs: we utilized the Gaussian of the grayscale images derived by RGB composites, the Sobel of the Gaussian image, and the Eigenvalues λ of the Hessian Matrix of the Gaussian image at different scale levels (for standard deviation σ = 1, 2, 4, 8, 16 ).
In order to enhance the included texture information, we also utilized the Local Binary Patterns features (LBP, LBP UNI), which inform about the uniformity of local texture. Intuitively, LBP examines the neighbors of a center pixel and determines if the neighbor pixel values are more or less than the center pixel value (Ojala et al., 2002).

Feature Selection and Machine Learning
In order to assess the impact of the extended spectral and spatial information, we adopted a Random Forest classifier following MARIDA quick start guide github.com/marine-debris/marinedebris.github.io. Followingly, we compared our results with baseline outcomes, and selected the most important features by performing a feature selection.
Finally, to better understand the combination effects derived from the selected features for the challenging cases, we visualize them and discuss the extracted insights.

EXPERIMENTAL RESULTS AND DISCUSSION
This section presents a quantitative and qualitative evaluation based on the extracted results after exploiting the 51 spectral and spatial features (S2 bands values, GLCM, Local Binary patterns, Gaussian, Sobel of Gaussian Hessian Eigenvalues of Gaussian) and applying the Random Forest classifier.

Feature Selection using Machine learning
Firstly, we quantitatively assess the performance of the applied Random Forest classifier (i.e., RF + ). To evaluate our results and compare them with the corresponding outcomes from MARIDA, we relied on three metrics, i.e., IoU, Recall and F1. The Table 1 demonstrates the scores for all metrics per class obtained by our RF + , as well as by MARIDA baseline models (RF * and U-NET * ). Overall, we observe that the proposed spectral and spatial features improve the classification performance, as it is indicated by the higher average scores that our model achieves for all metrics.
Regarding scores per class, Sediment-Laden Water still has the highest IoU, Recall and F1 scores (i.e., 0.99 -1.00). For Dense Sargassum, Sparse Sargassum, Ship, Clouds, RF + achieves an improvement of > +2% for IoU, > +1% for Recall and +2% for F1 compared with the RF * and U-NET * . Interestingly, for the Foam class, RF + improves IoU by +23%, Recall by +11% and F1 by +16%. Additionally, a significant improvement of +21% for IoU, +18% for Recall and +25% for F1 can also be seen for the Natural Organic Material class. However, both RF models provide the same results for the Marine Debris class. As far as the class Shallow Water is concerned, the U-Net model still achieves the highest scores for all metrics.
To qualitatively evaluate the performance of our Random Forest model (RF + ), we visually inspected the produced prediction  maps and compared them to the respective classification results extracted by MARIDA baselines (Figure 2). Although RF + and RF * provide the same scores for Marine Debris (Table 1), it seems that RF + predicts better the specific class ( Figure 2b). Especially, the prediction of Sargassum (Figure 2c,d) by RF + is significantly improved; this fact is also consistent with the higher scores that our model achieves (Table 1). Finally, the performance of RF + appears to be better than the U-Net * and RF * models over the coastal region (Figure 2a), where Foam, Shallow Water, and Turbid Water co-exist. Overall, it seems that using the spatial information at multiple scales leads to better classification results with less noise (e.g., less isolated pixels classified as Marine Debris) and improved shape of predicted features (e.g., Clouds).
For further experimentation, we calculated Spearman Correlation to form highly correlated groups among the included spec- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France tral and spatial features and keep only one from each group. More specifically, we grouped the classes with Spearman Correlation higher than a cut-off threshold. The selected features, each of which represents a different group, are demonstrated in Figure 3. By re-training the RF classifier on the non-highly correlated features we obtained almost the same results. This fact indicates that the selected features represent the same amount of information. Then, by applying the permutation feature importance as described in (Kikaki et al., 2022) on the non-highly correlated features, we managed to identify the most important features, i.e., NDWI, H EIG 2 S16, CON, NDVI, FDI and SOBEL S16. Instead, individual bands (e.g., green) and the LBP group do not contribute to the classification process. Interestingly, H EIG 2 seems to be more important than H EIG 1.

Further Qualitative Analysis
Finally, in order to further examine the ability of spectral and spatial features in competing classes discrimination, we visualize them per couple, and discuss the extracted insights based on the scattergrams analysis ( Figure 4). (Figure 4a and Figure  4b): Regarding the spectral patterns of the considered floating materials, Marine Debris presents slightly lower peak at NIR and higher values at SWIR. Well-established vegetation index NDVI tends to have positive values for Sparse Sargassum and negative values for Marine Debris (4a). Nevertheless, there is an overlapping area where NDVI values are close to zero, possibly reflecting the cases with low subpixel proportions (i.e., sparse conditions). Concerning FDI, it significantly contributes to the classification process, as it is the fourth most important feature (Figure 3). For this index there is a strong theoretical justification (Tasseron et al., 2021) as well, yet we confirm that FDI alone does not adequately separate the specific materials based on S2 data (Biermann et al., 2020). This fact probably corresponds to the level of marine debris submersion, which is higher at the sea surface than in a controlled environment. On the other hand, the concurrent use of NDVI seems to highlight Sparse Sargassum and Marine Debris spectral patterns differences. The combination NDVI & EVI could potentially separate these features as well (Figure 4b).

Marine Debris vs Sparse Sargassum
Marine Debris vs Ship (Figure 4c): Marine Debris and Ship present similar spectral properties due to the same polymer composition. Additionally, they can be distributed in similar spatial patterns. For instance, small vessels and marine debris pixels can be depicted as individual pixels, leading to a challenging discrimination task. This is can also be observed in Figure  4c, as none of the considered models is able to predict the small ship correctly. Due to these challenges, the well-established NDVI and FDI fail to distinguish Marine Debris from Ship. On the contrary, the GLCM CON texture feature appears to be promising in this case. The GLCM CON presents the high Ship contrast to is background; thus, the variety of its value might be linked to the ship size. Moreover, improved results are achieved by combining CON with the NDMI index. Marine Debris has higher NDMI value than Ship, based on the fact that in the case of floating and partially submerged marine debris the moisture is higher.
Marine Debris vs Natural Organic Material (Figure 4d): The Natural Organic Material class consists of woody and vegetation debris which tends to accumulate on the sea surface in very similar patterns to Marine Debris. As shown in Figure  4d, NDVI seems to contribute to the discrimination of the considered materials, as it captures the reflectance values difference at the red and NIR bands. The additional use of MNDWI enhances their separation. The specific spectral index, by using green and swir bands, extracts water information and removes background noise (i.e., built-up area, vegetation) (Xu, 2006). Marine Debris has higher values in MNDWI and lower (mostly negative) values in NDVI than Natural Organic Material.
Marine Debris vs Foam (Figure 4e): Foam, compared to Marine Debris, has higher reflectance values across spectral spectrum, presenting a peak at green and a local minimum at 740nm as well (Kikaki et al., 2022). Furthermore, Foam accumulation patterns in the wave breaking zone are different to Marine Debris. The concurrent use of NDWI and Hessian Eigenvalue 2 of the Gaussian of standard deviation σ = 16 seems to enhance the discrimination of these two classes. The NDWI has lower values for Marine Debris than for Foam; this is propably due to the fact that Foam is a water-related class. Additionally, the Hessian Eigenvalue 2 of the Gaussian of standard deviation σ = 16 is higher for Marine Debris than for Foam.
Sparse Sargassum vs Natural Organic Material (Figure 4f): Sparse Sargassum and Natural Organic Material are floating organic debris which follow similar linear trajectories. Except for the common spatial patterns, the specific floating materials can present similar spectral signatures, as natural organic debris may contain vegetation such as leaves or plants. The EVI, as expected, highlights the vegetation, yet there is an overlapping area where vegetation debris and Sparse Sargassum macroalgae cannot be discriminated. Instead, BSI decreases the overlapping area, further enhancing the separation between these two classes. Specifically, the BSI index receives higher values for Natural Organic Material, which properties appear to be closer to the soil, than Sparse Sargassum that mostly exhibits negative values.

CONCLUSIONS
In this paper, a comprehensive analysis of floating materials and sea features discrimination based on S2 multispectral satellite data is presented. Exploiting the recently introduced MARIDA dataset, we investigated the potential to separate materials and features on the sea surface using spectral indices and spatial features at multiple scales. To do so, we used machine learning The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France methods and selected the most promising and distinctive features based on the feature selection method. By evaluating our results, we indicate that our new model (RF + ), which exploits the spatial information at multiple scales, improves the classification performance compared with MARIDA baseline results.
In our analysis, we mainly focused on the ability to distinguish Marine Debris from competing classes such as (Sparse Sargassum, Ship, Natural Organic Material and Foam), which present similar patterns. Our experimental results indicate that specific spectral and spatial features contribute to the classification process, as well as certain combinations can enhance the discrimination of challenging classes. Preliminary results show that NDVI is capable of separating floating Marine Debris from other features that co-exist on the sea surface, such as Sargassum macroalgae. It can also contribute to the discrimination of Marine Debris from Natural Organic Material and Foam. However, annotated data with artificial materials, such as Marine Debris and Ship, cannot be separated by well-established indices (e.g., FDI NDVI). In this case, the utilization of spatial information along with different spectral indexes can be efficient in their discrimination.
Interestingly, although FDI contributes to the classification process, it cannot alone sufficiently separate Marine debris from other materials. Overall, we observe that, in some cases, individual features are not capable of differentiating floating materials. However, the covariance between at least two of them can enable separation of floating objects/ water classes, as indicated by the scattergrams analysis.
Future analysis on MARIDA can provide the community with further insights about floating materials detection and water quality monitoring. Except for spectral and spatial features examined in this study, temporal features can be also investigated to assess their contribution to specific marine classes detection (e.g., Turbid Water, Shallow Water).