RANDOM FOREST-BASED RIVER PLASTIC DETECTION WITH A HANDHELD MULTISPECTRAL CAMERA

Plastic pollution has become one of the main global environmental emergencies. A considerable part of used plastics materials is dispersed or accumulated in the environment with a significant damaging impact on many terrestrial and aquatic ecosystems. Artificial Intelligence has proven a fundamental approach in last years for the detection of plastics waste in the aquatic habitats: several groups have recently tried to tackle such problem by developing some machine learning-based methods and multispectral or RGB imagery. This study compares the results obtained by two machine learning classifiers, namely Random Forests and Support Vector Machine, to detect macroplastic in the fluvial habitat through multispectral imagery. The acquisition of images has been made with a hand-held multispectral camera called MAIA-WV2. Despite the obtained results are quite good in terms of accuracy in a random validation dataset, some issues, mostly related to the presence of white rocks and glares on water have still to be properly solved.


INTRODUCTION
Plastic is the third world's most produced material by industry and in the last fifty years it has been recorded a significant growth of production. It is a material with great versatility and indestructibility, and the disposal of plastic waste is becoming a dramatic problem. At global level, only 9% of used plastic is recycled, while the rest is burned and accumulated in landfills or in the environment (Geyer et al., 2017). Therefore, plastic pollution is one of the major global environmental emergencies, with a remarkable negative impact on many terrestrial and aquatic ecosystems. 150 million tonnes of plastic are present in the oceans and, according to the current trend, it is foreseen that in 2050 there may be more plastic than fish in the sea (Jong, 2018). The majority of plastic litter in seas and oceans comes from rivers (Lebreton et al., 2017), which motivates the need for detecting macro-plastics (linear dimensions > 5 mm, (Bråte et al., 2017)) in the fluvial environmental. Once the plastic reaches the sea the individual pieces tend to fragment. The microplastics, after a period in suspension, sink and deposit forever in the seabed. At the bottom of the sea these particles become food for marine organisms, hence plastics come to be part of the food chain. Detection of macro-plastics in fluvial habitats shall enable their recovery before they reach marine environments. Unfortunately, studies on this kind of detection and recovery system are still quite embryonal, and, in fact, only 20% of global studies about rivers concerns problems related to macro-plastics (Blettler et al., 2019), whereas most of them are concerning other problems (Aminti et al., 2020). Most of the research works on plastic waste detection only exploits Infrared bands (from 900nm to 1700 nm), in particular NIR (Near InfraRed) and SWIR (Short-Wave InfraRed), because these parts of the electromagnetic spectrum are not influenced by the object colour (Salzer and Heinz, 2014). A consistent part of works on this topic are based on the use of satellite imagery for * Corresponding author detecting large accumulations of floating plastics in natural seawater (Topouzelis et al., 2019;Themistocleous et al., 2020), however, the spatial resolution of satellite multi-spectral images is quite limited (from half a metre to some kilometres) and highresolution data are often not freely available. This motivated the use of a hand held multi-spectral camera in this study. Given the fast improvements obtained in the last decade on object detection and other tasks by artificial intelligence approaches, some research groups recently tried to exploit such approach to automatically detect plastic waste on either multispectral or RGB imagery. Among the machine learning tools, it is commonly accepted that, when compared with other machine learning approaches such as Maximum Likelihood Classification (MLC) (Ahmad and Quegan, 2012) or Support Vector Machine (SVM), Random Forest (RF) often allows to reach the best performance level in classification problems, in particular when dealing with highdimensional data and multi-class classification. It works well with noisy data and discriminate the classes having similar spectral characteristics. In addition, it is known for its capabilities of reducing the overfitting issue. (Akar and Güngör, 2012;Lowe and Kulkarni, 2015;Martin et al., 2018). Deep Learning on high geo-spatial resolution imagery has also been considered in (Wolf et al., 2020): in such work convolutional neural networks (CNNs) have been effectively employed to determine the type and quantity of waste dispersed in aquatic environmental, with an accuracy 83%. This study compares the classification results obtained by using RF and SVM on very high spatial resolution multispectral images.

DATASET CHARACTERIZATION
In this study, a proximity multispectral sensor (i.e. MAIA-WV2) combined with machine learning classification methods are deployed for detecting macro-plastics in fluvial ecosystems.
The multispectral camera is developed by SAL Engineering (Modena, Italy) and EOPTIS (Trento, Italy) and is equipped with nine different passive sensors (eight monochromatic and one panchromatic, RGB) that permit a simultaneous acquisition on all the available wavelengths through its global shutter technology (Nocerino et al., 2017). MAIA-WV2 has the same spectral bands as the WorldView-2 satellite (Digital Globe) from 395 nm to 950 nm. The disposable bands are violet, blue, green, orange, red, red-edge, NIR1, NIR2 and RGB (see also Table 1).

Table 1. Wavelenght intervals of MAIA WV-2 bands
The usage of the MAIA camera conveniently allows to collect very high resolution images (pixel size up to few centimetres) and to repeat the acquisitions without any additional cost (De Giglio et al., 2020). Each array has a CMOS sensor (size is 3,6 × 4,8 mm) and the size of pixels is 3,75 × 3,75 μm with 1,2 Mpixel of resolution. The sensors have fixed lens with nominal focal length of 7,5 mm and focal aperture equal to 2,8 mm (SAL Engineering and EOPTIS 2018). On the negative side, the area that can be covered in this way is clearly much smaller with respect to using satellite remote sensing methods. The images obtained from the MAIA multispectral camera are in RAW format but are convertible in TIFF format by the proprietary software of the camera. The software MAIA -Multicam Stitcher Pro allows geometric and radiometric corrections (SAL Engineering and EOPTIS 2018). This study exploits the dataset already considered by De Giglio et al. (2019), which distinguishes four different artificial scenarios of potential interest: high riverbanks, grass and trees, white rock immersed in the water, sandy soil and flowing water. However, this work focuses only on the flowing water case (172 multispectral images), i.e. floating plastics in fluvial water. This imagery is not overexposed nor underexposed, and it permits the investigation of the sunglint problem (Martínez-Vicente et al., 2019) (see Figure 1). In addition to the eight monochromatic bands directly provided by the multispectral camera, certain studies investigated the use of certain spectral indexes for increasing the classification accuracy, e.g. Page et al. (2020) and Themistocleous et al. (2020) suggested to use Normalized Difference Water Index 2 (NDWI2), Plastic Index (PI) and Reversed Normalized Difference Vegetation Index (RNDVI): (1) These indexes, along with the standard camera output channels, are considered in this study as inputs for a Random Forest and an SVM classifier.

PROPOSED METHOD
In the procedure implemented in this paper, machine learning tools, such as RF and SVM, are used in post processing for the plastic classification on previously acquired images.
In particular, the classification performance is investigated varying the combination of the considered inputs.

Random Forest
Random Forest algorithm, one of more popular multistage classifiers. It belongs to the Decision Trees group of classifiers. It has a non-parametric nature and good managing skills of nonnormal, non-homogeneous and noisy data (Ghose, Pradhan and Ghose, 2010). It has been widely used thanks to its high performance in terms of classification accuracy (Lowe and Kulkarni, 2015;Martin et al., 2018). RF classification is based on the use of a set of decision trees. Each individual tree in the RF delivers a class prediction and the class most highly voted is elected as the model's predicted class. Randomness plays a key role in the select of training subset from each tree (Lowe and Kulkarni, 2015). Random Forest has high robustness and low generalization error since it is part of the ensemble classifiers (Martin et al., 2018). An important role in the classification performance is played by the parameter settings. In particular, the number of trees (N) and the number of variables to split at each node (m). In particular, it is often suggested in the literature that setting the value of N to 500 allows a stabilization of the errors before this number of classification trees is achieved (Belgiu and Drăguţ, 2016).

Support Vector Machine
Support Vector Machine is a binary algorithm that identify a linear discriminate function with maximum margin to separate each class. When samples are not linearly separable it is even possible to apply a nonlinear transformation, e.g. also named the kernel trick, aiming at identifying a more appropriate spatial description of the dataset: a hyperplane is used in such a higher dimensional representation to separate the classes of interest (Akar and Güngör, 2012). Roughly speaking, the support vectors are the points of dataset that are closet to hyperplane; they are considered as the critical elements of the dataset. SVM aims to identify the hyperplane that best divides support vectors into the desired classes. The most popular kernels used in the kernel trick usually are: -Linear; -Polynomial; -Radial Basis Function (RBF). SVM showed to efficiently deal with classification on high spatial dimensions and to be quite versatile (Fletcher, 2009).

Connected region detection
The connected region detection is a process based on spatial proximity of pixels. It is typically used on binary image, to segment regions of pixels connected together. In this paper connected regions are computed in such a way to determine which areas are classified as plastic. The rationale is that (almost) isolated pixels classified as plastic have highly probably been misclassified. Consequently, connected regions originally classified as plastic are discarded if their area is smaller than a certain threshold. Table 2 summarises three different cases, whose performance will be compared in the following.     Validation of the RF allowed to obtain an accuracy performance of 98%, however the number of false positives is currently higher than expected. The following figures show two different images of the dataset with their respective classifications (see figures 3, 4, 5, 6, and 7, 8, 9, 10). It is worth to notice that the two images have different graphic results in the post classification.

DISCUSSION
Despite the results illustrated in the previous paragraph are quite encouraging, the RF accuracy and quality values are quite high in all cases (table 3 and 4), some issues are quite visible when dealing with the graphical results (figures from 3 to 10). The best results were obtained through the Random Forest classifier and with a dataset with 8 bands (violet, blue, green, orange, red, red-edge, NIR1, NIR2). Each validation index in the table 4 is more than 80%. Differently from other works in the literature, considering three additional channels (NDWI2, PI, RNDVI) didn't lead to any significant improvement in our case study (case 2 in tables 3 and 4). These results can be quite easily motivated for this dataset by checking for instance the plastic and non-plastic value distributions for such indexes in the considered dataset, for instance Figure 11 shows the PI distribution. The SVM classifier produced less satisfactory results with respect to RF. In the Case 3 the False Positive (FP) are about twice that in the other two cases, i.e. this classifier identifies many pixels as plastic even if they aren't. It is also important to notice that, despite the classifiers have been trained by using the same input size for what concerns the two classes (plastic and not-plastic), within each image the number of plastic pixels is much lower than those of the other class. Figures 4 and 8 show that classification problems related to sun glint, sea foam and withe rocks are consistent. The introduction of a region selection step based on the area of the detected (plastic) connected regions can partially reduce such classification issues (see Figure 5 and 6), however they cannot completely solve it (check Figure 9 and 10). In particular, the latter case shows that the (quite small) size of certain plastic samples can compromise the effectiveness of the connected region-based selection step. Motivated by these considerations, our future work will focus on the introduction of an additional classification/selection step based on the local image spatial statistics (Facco et al., 2013) and/or on the recognition of certain object shape (Su et al., 2015). Furthermore, the use of deep learning methods will also be considered in order to improve the overall classification performance of the system.

CONCLUSIONS
In this study we showed some initial results of a project dealing with the problem of macroplastic detection in the fluvial habitat through a handheld multispectral camera, where the acquired dataset is analysed by means of Artificial Intelligence tools. Despite the results obtained by means of a Random Forest approach are quite encouraging, some issues related to the presence of white rocks, foam sea and sun glint are currently not properly dealt by the implemented approach.
Since solving such issues can be crucial for the real effectiveness of the proposed approach in a real world scenario, our future work will focus on the development of new tools to properly deal with such problems.