UAV-BASED RIVER PLASTIC DETECTION WITH A MULTISPECTRAL CAMERA

: Plastic is the third world’s most produced material by industry (after concrete and steel), but people recycle only 9% of plastic that they have used. The other parts are either burned or accumulated in landfills and in the environment, the latter being the cause of many serious consequences, in particular when considering a long-term scenario. A significant part the plastic waste is dispersed in the aquatic environment, having a dramatic impact on the aquatic flora and fauna. This motivated several works aiming at the development of methodologies and automatic or semi-automatic tools for the plastic pollution detection, in order to enable and facilitate its recovery. This paper deals with the problem of plastic waste automatic detection in the fluvial and aquatic environment. The goal is that of exploiting the well-recognized potential of machine learning tools in object detection applications. A machine learning tool, based on random forest classifiers, has been developed to properly detect plastic objects in multi-spectral imagery collected by an unmanned aerial vehicle (UAV). In the developed approach, the outcome is determined by the combination of two random forest classifiers and of an area-based selection criterion. The approach is tested on 154 images collected by a multi-spectral proximity sensor, namely the MAIA-S2 camera, in a fluvial environment, on the Arno river (Italy), where an artificial controlled scenario was created by introducing plastic samples anchored to the ground. The obtained results are quite satisfactory in terms of object detection accuracy and recall (both higher than 98%), while presenting a remarkably lower performance in terms of precision and quality. The overall performance appears also to be dependent on the UAV flight altitude, being worse at higher altitudes, as expected.


INTRODUCTION
Given the high usability and versatility, plastics are commonly used for many purposes. Plastics are lightweight, malleable, inexpensive and easy to produce materials. Today, it is still not possible to dispose of most of the used plastics in a proper and sustainable way, and it is often dispersed on the environment, leading to a significant impact on flora and fauna (Plastics Europe, 2022, United Nations Environment Program, 2018. In particular, every year 20 million tons of plastic waste are dispersed in the aquatic ecosystem and if the trend is not reversed, it is foreseen that there will be more plastic than fish in the seas and oceans in 2050 (Jong, 2018). Most of the plastic pollution in the oceans comes directly from coastlines or via rivers. The plastic waste transportation through waterways is still not sufficiently well investigated, but it has been estimated that more than 2 million tons come from rivers every year (Meijer et al., 2021, Tasseron et al., 2021. The study of plastic fluvial transportation and sedimentation is important because in this context plastic waste can be detected and removed before it reaches the marine environments.
During the last twenty years, several efforts have been made by the remote sensing community for the plastic pollution detection over large areas. More specifically, most of the research works on this topic focus on the detection of floating litter in seas and oceans through satellite images.
To this aim, the use of the freely accessible Copernicus Sentinel-2 mission data is currently a quite convenient option. Sentinel-2 satellite data are multispectral images are made up * Corresponding author of 13 spectral bands from VNIR to SWIR. Their spatial resolution varies from 10 to 60 m (ESA -Sentinel Online, n.d.). Such data have been exploited by  and (Themistocleous et al., 2020) to investigate the detection of large artificial plastic targets. Such detection of plastic waste, and its separation from water or aquatic vegetation, is often based on the use of spectral indices such as Normalized Difference Water Index 2 (NDWI2), Plastic Index (PI), Reversed Normalized Difference Vegetation Index (RNDVI) and Floating Debris Index (FDI) (Biermann et al., 2020, Page et al., 2020, Themistocleous et al., 2020.
Instead, the detection of smaller plastic objects, usually over smaller areas, should be done by different instruments, ensuring the representation of the region of interest at a higher resolution. Since Unmanned Aerial Vehicles (UAV) allow the quite quick and easy acquisition of high-resolution spatial data over relatively large areas, they appear as a quite ideal option for plastic detection in local areas, such as rivers, beaches, coastlines or lakes, equipping the UAV with proper sensors, such as RGB and multispectral cameras (Martin et al., 2018). Plastic detection in this imagery is typically done by means of machine learning tools, and, recently, with deep learning-based approaches (Jakovljevic et al., 2020, Wolf et al., 2020. Given the important role of waterways in the plastic waste transportation to the seas, this paper focuses on detection of plastic objects in river environments. Similarly to the satellite remote sensing approaches, this work aims at plastic litter detection by properly employing multispectral information in a machine learning approach (random forest-based detection), ((Belgiu and Drȃguţ, 2016)), described in Section 3. The proposed approach has been tested on multi-spectral imagery, col-lected by means of a proximity multi-spectral sensor, namely the MAIA-S2 camera, mounted on a UAV, the DJI Matrice 300, while flying over a portion of the Arno river (Italy). A more detailed description of the case study is provided in Section 2, whereas the obtained results are reported in Section 4, and some conclusions are drawn in Section 5.

Study Area
The study area has been identified in the locality of Prulli (Reggello, Florence, Italy) on the Arno river. A portion of the river, approximately 100 meter long, has been selected for its characteristics, such as the width, the depth of the stream bed, and the possibility of reaching the centre of the watercourse via some ruins of an ancient bridge (Ponte di Annibale).
Plastic samples (bottles, flacons and shoppers) were introduced into the river, and tied to the ancient bridge through a transparent fishing line, in such a way to ease the object recovery at the end of the data collection. The fishing line length was different for each plastic object, and each of them had a limited freedom of movement. Figure 1 shows the region of interest, i.e. where the UAV flew (red rectangle) and the place where the fishing lines were anchored (yellow circle).

Dataset
The imagery has been acquired by means of the multispectral camera MAIA-S2, developed by SAL Engineering (Modena, Italy) and EOPTIS (Trento, Italy). This camera is provided with nine different optical sensors, with the same bands of the Sentinel-2 Satellite (Table 1) (SAL Engineering and EOPTIS, 2018). In particular, two are red edge bands (S5, S6), whereas the last three are in the near infrared (S7, S8, S9). The native camera resolution is 1208 × 960, the nominal focal length is 7.5 mm, and hence a field of view of 35°× 26°.
The MAIA-S2 camera was mounted on a DJI Matrice 300, which flew at low speed over the area of interest, while collecting one image per second. The UAV flew at different alti-tudes, ranging from 20 m to 80 m from the ground, in order to compare the plastic detection results varying the flight height. Image locations were acquired by means of an external GNSS receiver mounted on the UAV. This receiver worked in RTK mode, exploiting corrections provided by a GNSS base-station located approximately 100 m far from the area of interest.
An irradiance light sensor (ILS) was mounted on the top of the DJI Matrice 300 (as shown in Figure 2), enabling automatic radiometric correction of the multispectral imagery. Geometric corrections and co-registration of the bands were computed by means of the MAIA image-processing software, provided by SAL Engineering along with the camera. Overall, the dataset is composed by 1268 images at different altitudes (from 20 m to 80 m, as shown in

METHOD
The proposed method for plastic detection is based on the use of a multi-step random forest approach, where the final results are The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France obtained by means of the cascade of two random-forest classifiers (named classifier A and B hereafter) and of an area-based selection criterion. It is worth to notice that the method proposed in this work is an evolution of the one presented in , in particular improving the previous results by including the above mentioned two-step procedure.
Ideally, a unique classifier, as in , able to properly distinguish "Plastic" from "Other" pixels would be preferable. However, according to the experimental tests performed with the collected data (see Table 2), the performance of such a in certain cases is unsatisfying. Hence, a multi-step classification procedure has been used, instead.
The analysis starts with a pixel-based classification, where only two classes are considered in the classification process, "Plastic" and "Other", aiming at distinguishing pixels related to plastic objects from all the others.
Both classifier A and B are pixel-classifiers: they receive as input the co-registered nine bands of a single multispectral image, and they analyse pixel by pixel the nine-band measurements, aiming at distinguishing plastics from the other materials by their peculiar spectral response.
Despite being similar in terms of the general goal (i.e. detecting plastic pixels), classifier A and B differ for their specific classification ability, which have been achieved by using different trainings.
To be more precise, classifier A is well suited for ensuring an optimal performance in terms of avoiding false negatives (i.e. pixels incorrectly classified as "Other") in the pixel plastic detection. Differently, classifier B ensures a better performance in terms of reducing the number of false positive pixels, i.e. pixels incorrectly classified as "Plastic".
Since plastic objects may not visible in many images, and since the number of plastic pixels in an image is usually small in an image, the total number of "Other" pixels is huge with respect to the total number of "Plastic" pixels. Consequently, the direct use of classifier A and B on the multispectral images typically lead to a much worse performance of A, due to the much higher number of false positives.
Examples of the results obtained by classifier A and B on one of the images acquired at 30 m height can be seen in Figure 3 -6: Figure 3 and Figure 5 show the results obtained on the image by using classifier A and B, respectively, whereas 4 and 6 are the zooms of the previously mentioned figures on one of the plastic objects. These figures compare the data inputted in the classifier (shown as RGB-like images, obtained by bands S4, S3 and S2, see Table 1) with the obtained results (plastics are shown in white, whereas "Other" pixels in blue).
By comparing the results in Figure 3 -6, the better ability of B in avoiding false negatives is quite apparent, whereas A reduces the amount of false positives (usually appearing as salt and pepper-like noise), as expected.
The different classifier behaviors derive from a different training: A: 500 trees have been used in the random forest classifier, trained by randomly extracting from the dataset 100k "Plastic" and 500k "Other" pixels. The use of unbalanced training datasets for the two classes is motivated by: 1) the quite small amount of plastic pixels available in the images, which limits the maximum number of plastic training pixels, and 2) the need of a reasonably large number of "Other" pixels to decently represent the characteristics of the huge amount of pixels in this class. The use of balanced training classes is foreseen in the future developments of this work in order to yield to a more robust classifier performance.
B: Random forest classifier B has been set and trained similarly to A, but inserting an additional set of 90k pixels in the "Other" training dataset, randomly extracted from certain critical areas (rocks, sunglint, foam, etc.). These additional pixels allow classifier B to improve its ability in properly recognizing the pixels of such critical areas.  Given the peculiar characteristics of the above presented classifiers, B appears to be a reasonable first step for a plastic detection procedure, allowing the identification of most (but not all) of the plastic pixels, while keeping the number of false positives lower with respect to A.
Once some plastic pixels have been found by B, A is used in order to improve the plastic pixel detection just in the neighboring The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France  area of the already found pixels. In practice, the pixels in a spatial buffer around any previously detected plastics are inputted in classifier A. Then, the results of A and B are merged: B outcome is used in the "Other"-only areas, whereas A outcome is used to better discriminate "Plastic" and "Other" pixels where plastic is present.
For instance, this procedure leads to the substitution of the B results shown in Figure 6 with the A outcome in Figure 4 thus ensuring a more effective identification of the plastic boundaries. The results obtained by using classifier B, A and the cascade B+A can also be compared in Figure 7 and, once zoomed, in 8.
Given the pixel Ground Sample Distance (GSD) size (Table 2), once entered in the fluvial environment usually the area of a floating macro-plastic object is quite larger than a pixel. Since the classification errors resulting from the cascade of B and A are often salt and pepper-like noise, an area-based selection of the detected plastic areas can be used in order to reduce the number of false positives.
First, the image regions associated to different plastic objects are identified by computing the connected regions, by means of a flood-fill algorithm (Fishkin and Barsky, 1985). Then,  the plastic objects with areas lower than a certain threshold, to be defined as a design parameter of the procedure, are discarded. The overall procedure is summarized in the block diagram shown in Figure 9. Figure 10 compares the areas of the plastic objects and of the false positives (just after applying the B+A cascade) in the 30 m height case. The quite remarkable area difference in this dataset partially justifies the adoption of the area-based selection, as explained above. This choice can clearly have a negative impact on the detection performance when the size of the plastic objects dispersed in the considered region is smaller than the considered threshold area. Table 3 shows the numerical classification results, obtained by applying B and A, at two different flight altitudes, i.e. 30 m and 80 m from the ground. This comparison aims at highlighting The characterization of the spectral signatures in Fig. 11 resembles those presented in , even though using different bands, i.e. different portions of the electromagnetic spectrum.

RESULTS AND DISCUSSION
While water and transparent plastic signatures are quite different ( Fig. 11(a) and (  the latter is apparently similar to the sunglint one ( Fig. 11(a) and (c)). Being sunglint hardly distinguishable from certain plastics in the considered portion of the electromagnetic spectrum, such similarity can be considered as the main cause of the previously mentioned false positives.
Although sunglint and certain plastic samples have very similar spectral signatures, their size is usually quite different: Fig The above consideration motivated the introduction of an areabased object selection step, i.e. selecting only those objects with an area above a certain threshold (which should be properly set in order to ensure a satisfying performance of the approach), leading to a remarkable reduction of the false positives(see Table 4  Despite improving the performance at both the considered altitudes, such improvement is clearly much less apparent in the 80 m case. This is because the sunglint effect causes the false detection of areas of similar values in pixel at both the altitudes, whereas the plastic object area reduces significantly with the increase of the flight altitude (and hence of the GSD), making them less distinguishable.
Consequently, the effectiveness of the implemented area-based selection step reduces when increasing the flight altitude. As a side effect, plastic objects smaller than the used area threshold cannot be detected: hence, such threshold should be carefully chosen, and, in general, it cannot be used when the plastic objects to be detected are of size comparable with the pixel size.
Given the presence of rather indistinguishable plastic objects and sunglint in the acquired multispectral images, overcoming the need of an area-based selection step requires the introduction of some different information. For instance, in analogy with , the use of different bands in the infrared spectrum are expected to be useful: this will be investigated in our future work.

CONCLUSIONS
This paper presented a multi-step approach macroplastic detection in river ecosystems, by exploiting multispectral imagery acquired by an UAV. The main step of the proposed method is based on the combination of two machine learning classifiers (i.e. the cascade of two random forest classifiers).
The obtained results proved to be quite promising, especially for images acquired at quite low altitudes, whereas a degradation of the performance has been shown when increasing the acquisition altitude.
In accordance with similar works for plastic detection from satellite imagery, the introduction of information at certain additional bands in the infrared spectrum are expected to improve the overall detection performance of the algorithm, for instance making plastics more distinguishable from sunglint. This aspect will be considered in our future investigations, along with the use of other machine learning tools, such as deep learning approaches, in order to improve the plastic detection results.