DEFORESTATION MAPPING USING SENTINEL-1 AND OBJECT-BASED RANDOM FOREST CLASSIFICATION ON GOOGLE EARTH ENGINE

Deforestation can be defined as the conversion of forest land cover to another type. It is a process that has massively accelerated its rate and extent in the last several decades. Mainly due to human activities related to socio-economic processes as population growth, expansion of agricultural land, wood extraction, etc. In the meantime, there are great efforts by governments and agencies to reduce these deforestation processes by implementing regulations, which cannot always be properly monitored whether are followed or not. In this work is proposed an approach that can provide forest loss estimations for a short period of time, by using Synthetic Aperture Radar imagery for an area in the Brazilian Amazon. SAR are providing data with almost no alteration due to weather conditions, however they may present other limitations. To mitigate the speckle effect, here was applied the dry coefficient, which is the mean of image values under the first quartile while preserving the spatial resolution. While for obtaining land cover maps containing only forest and non-forest areas an object-based machine learning classification on the Google Earth Engine platform was applied. The preliminary tests were carried out in a bitemporal manner between 2015 and 2019, followed by applying the approach monthly for the year of 2020. The outputs yielded very satisfactory and accurate results, allowing to estimate the forest dynamics for the area under consideration for each month.


INTRODUCTION
Due to various processes natural and anthropogenic, deforestation is increasing, and it represents a global issue leading to climate change, loss of biodiversity and larger probability of hazards. As an issue from such a scale, scholars and policymakers are putting great efforts in, firstly, directly mitigating the deforestation processes and secondly in monitoring the applied measures. Nowadays, the abundance of earth observation data eases the latter which, from its side, helps in the improvement of the mitigation strategies and policies. For monitoring and mapping forestation dynamics, the focus is on the use of medium and highresolution multispectral images, due to their advantages when dealing with vegetation. However, the biggest disadvantage in using optical imagery is the cloud cover that reduce the sensing of the Earth surface and that can be regular phenomena especially in tropical forests. Spaceborne Synthetic Aperture Radar (SAR) has on the opposite the advantage that sensors provide data with almost no restrictions from weather conditions. The limitation in this case is that, due to backscatter speckle, SAR imagery can introduce additional difficulty to apply, otherwise, straightforward processing, such as change detection with single thresholding. Scholars applying various approaches to deal with the SARrelated limitation, such as, statistical thresholding of time-series (Canty et al., 2020); adoption of the shadow effect created from the SAR imaging on the border between forested and nonforested areas (Bouvet et al., 2018); usage of different SAR frequency ranges (Rahman and Sumantyo, 2010) or directly combining radar data with optical (Hirschmugl et al., 2020). *

Corresponding author
To tackle some of the previously mentioned restrictions and to obtain rapid monitoring of deforestation dynamics, in this study was applied object-based machine learning imageclassification on Sentinel-1 (SE1) datasets using the advantages, in terms of data catalogue and processing, of the cloud computing service Google Earth Engine (GEE).

CASE STUDY AND TIMEFRAMES
As a case study, it was chosen an area in the southern Pará state (Brazil) covering an area of around 49,000 km 2 ( Figure 1). This state is considered as one of the most affected area in the Brazilian Amazon since 1990s. Another reason for choosing it as the area of interest (AOI) is that some studies using multispectral data have been already carried out (Brovelli et al., 2020), which allows a straightforward validation of the applied methodology. The first study timeframe was chosen to match with the formal study with the aim to calibrate the model deriving deforestation rate for the period 2015 -2019 (from now it will be referred as Timeframe 1). For both years we selected the same intervalsfrom the beginning of May till the end of August. After tunning and evaluating the processing methodology, the optimized framework was applied monthly for the year 2020 (Timeframe 2) for a subset of the total AOI. The small subset that was taken into consideration is in the central-west regions of the AOI, covering an area of 450 km 2 , and it was chosen because it was noticed rapid deforestation in the previous studies. The main reason for the further reducing the area under investigation was the availability of data needed for further data aggregation. The availability of the scenes and their coverage will be discussed more into details in the following section.

Datasets for classification processing
In the presented study, the main aim was to use only observations from SE1 A/B directly available from the GEE data catalogue. Images in the collection SE1 SAR Ground Range Detected (GRD) are already pre-processed in terms of thermal noise, radiometric calibration, and terrain correction. For the AOI, SE1 overpasses are only in descending direction. It should be noted that even though the SE1 B is operational since 2016, the overpasses over the AOI are not regular and images from this satellite were limited even in the recent Timeframe 2. Moreover, the AOI cannot be covered with one scene from a single overpass, actually in the case of Timeframe 1 2015 four scenes to cover it fully were needed. In the cases, of 2019 and 2020 the needed images were five.

Sentinel-1 GRD:
One of the main steps for successful carrying out the analysis was an implementation of data aggregation over time, for which was required to be used regular number of images independent from how much their footprint is covering the AOI. For the predefined study period in 2015, only four scenes in total were available from SE1 A, meaning that one overall image was used in the further processing. In the meantime, for 2019 50 scenes were available but the AOI was covered by five footprints leaving in total 10 images. However, the image availability for Timeframe 2 was unexpectedly more irregular compared to Timeframe 1. For most of the months into consideration, the images from the orbit covering the west part of the AOI were exhibiting almost two times (even more in some cases) more images than the other orbit covering the east parts. It was decided that such significant difference in the datasets would lead to severe misclassification because after image sampling, similar property regions would appear having different properties in the classification model. After verification of the data according to the availability for both orbits and to ensure equality between them, it was concluded that the maximum number of images that could be used for analysing the whole AOI on a monthly time-step is two. Which led to a certain level of uncertainty of the quality for the expect outputs. Considering the unevenness in the data distribution in both orbits, it was decided to carry out the analysis as planned according to Timeframe 2 but for a subset of the AOI covered entirely by the relative orbit providing the highest number of images. In this case the minimum number of images used was three and the maximum -five. The full summary of the scenes used per a timeframe is reported in the Table 1.

Sentinel-2 MSI:
Using SAR imagery has its own advantages and disadvantages. When carrying analyses in a tropical one of the crucial issues that must be dealt with is the almost constant cloud cover during the raining seasons and it is when radar imagery comes useful because can provide useful information with almost zero alteration due to the weather conditions. However, due to topographic effects in SAR can be noted the shadow, layover and foreshortening effects which in the current study can create additional difficulties in properly differentiating the forested areas from non-forested. The adopted approach to mitigate those effects and improve the final output was to implement object-based classification. Moreover, it was decided to combine SAR data with Sentinel-2 (SE2) Multispectral Instrument (MSI). This fusion was carried out, as an addition to main SAR processing and to highlight its contributions. The use of the two types of images was only used for the study period in 2019, due to the unavailability of cloudfree SE2 images in 2015 and on a monthly basis in 2020.

Datasets for external validation
For external validation of the outputs from Timeframe 1, photointerpreted control points (Brovelli et al., 2020) and highresolution images from China-Brazil Earth Resources Satellite program CBERS 4 (5m/pix) were used. The classification outputs from Timeframe 2 were externally validated only for the months of September and October 2020, again by control points obtained via photointerpretation using high resolution images from the Norway's International Climate and Forests Initiative ("NICFI") and distributed by Planet Labs Germany GmbH (Planet Labs, 2021). The distributed datasets represent basemaps covering the tropic regions and have a spatial resolution of 4.77 m/pix. The analysis-ready sets are having four spectral bands (Red, Blue, Green and Near-infrared. From a temporal point of view, the products are on a biannual and monthly basis, where in this study were used only the monthly September and October 2020 products. Firstly, because this dataset is available only after September 2020 and secondly, big percentage of the rest two available images (November and December) were covered with artifacts probably due to the cloudmasking processing and rendered the images unsuitable for validation purposes.

METHODOLOGY
The applied processing workflow can be divided into four main steps -pre-processing, segmentation, classification and validation. The same steps were applied for both timeframes. The general workflow is represented in Figure 2. All the steps were The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B3-2021XXIV ISPRS Congress (2021 implemented entirely in GEE. GEE is a cloud platform providing processing capabilities on a planetary scale and access to vast variety of data catalogues (Gorelick et al., 2017). Only for the photointerpretation of NICFI datasets for the control points for Timeframe 2 done QGIS and ACaTaMa plugin were used (Liano, 2019).

Pre-processing
As mentioned before, SE1 are already pre-processed and available in the GEE data catalogue, therefore no multi-looking or speckle filtering was applied.
To reduce the effect of the speckle the dry coefficient on the stacks of scenes, as proposed by Dostálová et al. (2016), was applied. The dry coefficient represents a mean aggregation of all values under the first quartile of each pixel. By its application, the speckle is mostly mitigated while preserving the spatial resolution. Moreover, in the current case, the difference between the forest and non-forest area is further highlighted. Such example can be seen in Figure 3, where a single VV polarization image and the dry coefficient obtained from 10 images are compared. The darker grey areas represent areas without forest cover, while the lighter ones dense forest. The dry coefficient was applied for the VV and VH polarization bands, and to their difference. As mentioned in Section 3.1, all the reported number of images in Table 1 were used for the implementation of the dry coefficient.

Segmentation
The main difference between pixel-based and object-based image classification is in the case of the former classification is done on the properties of a single pixel, while in the latter on a group of pixels (Walter, 2004). Another aspect for adopting clusters was that by their use could be reduced the SAR topographic effect, which in a pixelwise approach will lead to a high percentage of misclassification.
To perform an analysis for a whole object, image segmentation and the computation of per-cluster properties were performed. One of the implemented algorithms in GEE (Gorelick et al., 2017) that can perform segmentation is the Simple Non-Iterative Clustering (Achanta and Susstrunk, 2017). The Simple Non-Iterative Clustering (SNIC) is based on grouping pixels that share similar properties, however SNIC is non-iterative and less computationally demanding in comparison to Simple Linear Iterative Clustering (Achanta et al., 2012). SNIC requires the location spacing of a superpixel seed and allows regulation on the cluster shape. For the current AOI covering an area of almost 49,000 km 2 and the scale of the problem was not justified to implement very dense grid. However, were tested two sets -one with 25 pixels spacing and with 100 pixels. After a trial it was concluded that there is no advantage, in terms of classification output to use further the 100 pixels one. On the contrary, it demanded an increase in the computational time due to the size of the clusters during the property aggregation step. Just in the case of the subset AOI it was noted a slight improvement of the output results when using 15 pixels seed grid, compared to 25. It was noted that during the segmentation of the total AOI it was needed to intermediately save the output clusters before continuing with the processing, otherwise the allocated memory The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition) on GEE was exceeded. For the scale of subset AOI no preliminary save was needed. An example of clustering can be seen in Figure 4 which is covering the same area as in Figure 3. The deforested area is highlighted with the red line and it can be noted how the separate clusters are following the area boundaries.
In SNIC it is possible to regulate the cluster shape to be compact square-like or to be totally irregular. Using more sparse seed grid led to unusable compact and regular clusters. The resulted squared clusters lead to the wrong classifications. Thus, it was decided to set the compactness to a bare minimum of 1%.

Cluster property computation
During the SNIC segmentation, the algorithm computes and outputs the mean, per cluster, of each band that was using (i.e. the dry backscatter for each input polarization and their differences). In addition, it was decided to compute and include into the classification the following per cluster properties: standard deviation, variance, area, perimeter, width and height. It was decided to use the geometrical properties of the clusters as an input in the classification, firstly because the clusters tend to be more irregular and to merge in a bigger one, and secondly more regular patterns in deforested areas and more irregular in natural features were observed.

Algorithm:
In this work the supervised machine learning classification Random Forest (Breiman, 2001) was used.
It represents an ensemble of decision trees (Ho, 1998), where each tree is fed by a random sampling from the input data, followed by a classification vote and the one with the most votes is the output for the final decision. The algorithm proved its use for variety of earth observation applications such as land cover classification (Gislason et al., 2006), hazard mapping (Feng et al., 2015;Yordanov and Brovelli, 2020a). Moreover, it is widely applicable on through medium and high-resolution remote sensing images (Hayes et al., 2014;Thanh Noi and Kappas, 2018).

Training/testing datasets:
Since the study periods in Timeframe 1 are matching with the already validated maps from the previous study, the classified outputs from the multispectral pixel-based classifications were sampled and used as training/test inputs for the relevant years. The total amount of sampled points was 4,000 in total for both forest/non-forest classes.
The previous studies were carried out till the end of August 2019, and they could not be considered as suitable as input datasets in the case of Timeframe 2 that starts four months later. In this case the Global Forest Change v1.8 2000-2020 dataset (Hansen et al., 2013) was used. After fine-tuning the dataset that yielded the most satisfying results, it was containing in total 500 training/testing points. In both timeframes 80/20 training/testing ratio was implemented.

Validation
Classified thematic maps need some validation to be evaluated their accuracy and suitability for future use (Bratic et al., 2018b). As in most of the times it is impossible to carry out field survey and to validate the output, it is widely adopted to do a validation using external data source as a reference.

External validation sources:
Control points previously photo-interpreted (Brovelli et al., 2020) through CollectEarth tool (Bey et al., 2016) and CBERS 4 datasets were used to externally assess the accuracy of the outputs from Timeframe 1.
In the case of Timeframe 2, it was mainly relied on the internal validation metrics, due to the unavailability of free highresolution datasets for most of the time-steps. However, whenever external validation was carried out the number of samples (n0) used was determined through the Cochran's sample size formula (Cochran, 1965): where, e = is the margin of error p = is the estimated proportion of the population q = 1-p z = z-score of the confidence level 4.5.2 Validation metrics: GEE offers set of tools for computing evaluation metrics, however the metrics were expanded according to recommendations for accuracy assessment of land cover maps (Bratic et al., 2018a), where user and producer accuracies were also used. Moreover, precision and recall were also computed, since are yielding more realistic representation of the accuracy when dealing disbalanced binary classification (Yordanov and Brovelli, 2020b).

RESULTS
The overall results depicted a very good implementation of the dry coefficient and object-based machine learning classification. As the processing was carried out according to the two timeframes, the results will be presented accordingly in the following sections. Firstly, will be assessed the output classification, followed by the estimation of the actual forest loss for the given time interval.

Classification results:
Each of the produced binary map was validated with external control points. Such validation is considered as more accurate and in the current paper will not be reported and discussed the internal training/testing metrics, even though they depicted higher accuracies than the external one. A summary of the assessment metrics is reported in Table 2, where after the year is denoted there the map is produced only by using SAR images or with the inclusion of optical MSI.
As it was expected, the better results were noted for the second classification map for 2019 due to the higher availability of SAR data (10 images) that were aggregated through the dry coefficient.
On the contrary, in the case of 2015 the coefficient was not implemented as intended since only one image was available. However, the obtained result was considered as satisfactory, and the binary map was further used the estimation of the forest loss. In Figure 5 are exhibited few map snippets, where firstly it is highlighted the contribution of more images in final dry coefficient (2015 is in snippet a, and the 2019 case in b). The related object-based classifications are depicting extremely low level of noised misclassification pixels that is usually very evident in the pixelwise approaches. However, it also comes with its limitation, it can be seen in a misclassification of areas that are relatively small and inside a bigger region from the opposite class. Mainly, this issue is related to the parameter definition in the segmentation step which for the current case study and its scale was found to be a challenging task to find the balance between high precision and acceptable computational demand. In regard to the fusion of SAR and MSI datasets it was noted an overall improvement in the final results; on one hand the validation metrics were increased and, on the other, the final map was more consistent, and it was noted that the misclassifications were usually in areas hard to be interpreted even upon a visual inspection.
Overall, the source of classification errors in both years was due to the topography effects of the radar imaging.

Forest loss estimation:
The deforested areas were determined by a simple change detection analysis through the map difference between two time-steps and compared to the output from the pixel-wise classification from the previous study which resulted a forest loss for the period 2015-2019 of around 2,570 km 2 . The estimated loss from purely the SAR-based classifications is 4,220 km 2 . When difference was computed through the difference between 2015 SAR and 2019 SAR+MSI, the total area dropped to 3,737 km 2 . Both derived values exhibit great overestimation of the forest loss, however it should be considered the lower accuracy of the 2015 classification output. Even though the final results in terms of derived loss were not completely satisfactory, it showed that the approach is applicable for estimating forest dynamics and that its accuracy strongly depends on the amount of data used. Thus, it was decided to carry out another analysis on a shorter time-step but with more regular data availability, as it is the case of Timeframe 2.

Timeframe 2
Except the difference in the time-step used in this case, the other major difference with the previous timeframe is the increased and more consistent number of observations from Sentinel-1 (evident in Table 1). The more regular image stacks were expected to lead to more consistent classification and final outputs. Certain level of disagreement between the monthly classifications was expected due to the change in the seasonal vegetation and the dry/rainy season, typical for the tropical regions. These two factors potentially will alter the radar backscatter and predispose to misleading properties fed to the machine learning model. This effect was not expected to entirely affect the analysis in Timeframe 1 since the same period for both 2015 and 2019 years was defined.

Classification results:
After fine-tunning the parameters for the segmentation and machine learning classification, a very satisfactory classification output for each month under consideration was achieved. Except the use of regular sets of SE1 scenes, the other advantage of carrying out the analysis for the subset AOI was the reduced computational memory required and hence the decreased computational time, which further allowed to parametrise even better the processing chain.
In Figure 6  evident that using more and almost regular number of images to derive the dry coefficient resulted in better final products. Even further confirmation for the particularly satisfactory results are the validation metrics (Table 3) obtained using the external control points from the photointerpretation of the high resolution NICFI datasets. As mentioned before, the external accuracy assessment was done only for the months of September and October due to reduced data availability for the other months. However, the results are in accordance with the internal validation where the slightly better performance for the classification in September is evident in both cases. Few examples from the monthly classifications are represented in Figure 7, where in the left column there are the aggregated dry images and in the right their binary classification maps. The maps were deliberately chosen for the months of January, June and September for couple of reasons. The former is that there is a notable difference between the dried images, and this is due to seasonal variations (dry and rainy periods) and to the natural cycle of some vegetation. In this example during September 2020 it was observed the least amount of low vegetation, which is related to the radar signal backscatter and on other hand led a better distinction between dense forest cover and bare soil/low vegetation. The latter is the clear visible appearing of a new deforested patch after June 2020 (south-east area).

Forest loss estimation:
From analysing the estimated forest loss on a monthly basis and inspecting the produced maps, it can be deducted that, even considering a small area in the Amazon, deforestation process is ongoing with a rapid pace. Figure 8 represents the cumulative forest loss for the full year 2020, where the first month (January) was considered as baseline with zero loss. It is notable that in the first half of 2020 the loss rate is low even if with a steady trend; however, after June the total deforested area is rapidly increasing. This observation could be related to the appearing patch discussed in the previous section which is the responsible for around 45% of the total estimated forest loss. The mentioned is clearly visible in the Figure 9 where it is depicted the total forest loss for the year 2020.

Identification of deforestation clusters:
By having the deforestation maps on a regular basis (as the current output is per month for a year) it is possible to carry out a spatial analysis and to identify if some areas are emerging with more severe deforestation processes. It was carried out such simple hotspot analysis for the subset AOI using the monthly forest loss information. Figure 10 shows three major areas where the deforestation processes were more concentrated. Such analysis can be a significant help in highlighting areas which are more prone to be further exploited, or when is needed to monitor the implementation of mitigation strategies. In fact, upon visual inspection from the available monthly dataset from NICFI, it was noted that the deforested area exhibiting the most severe processes is continuing to expand in the months after the analysis was carried out ( Figure 11); which just highlights the fact that the deforestation progresses in the Amazon, as one constantly ongoing process leading to severe consequences. Judging to the deforested regular patterns it is evident that those processes are not natural, but human-induced. Figure 11. Snippet of ongoing deforestation processes. January 2021 image is not included due to a significant cloud coverage (Imagery-monthly NICFI dataset, distributed by Planet Labs.).

CONCLUSION
Deforestation in the Amazon is a continuous process which is mainly a result from human activities, which has many consequences directly on a local and global scale. Many studies are carried out on the topic for monitoring the forest dynamics using remotely sensed data from satellites and machine learning algorithms. In this study, an approach from combining freely available Sentinel-1 Synthetic Radar Aperture datasets and object-based Random Forest classification on the cloud processing service Google Earth Engine was presented. The study was carried out in two main parts, differing by the spatial and temporal scale. The first one was done on a time-step of four years (2015-2019) with the aim to tune the proposed approach and easily validate the results with previous studies of the same area. The second one was applied on an area relatively smaller. In this case, the aim was to verify the usage of the proposed method and the usability of derived results for a much shorter time-step (one month). If needed and depending on the data availability, this step can be further reduced. Such short timesteps are almost impossible to be implemented when using only optical imagery because of the often and dense cloud coverages in the Amazon region.
One of the main components in using SAR data was the aggregation of the backscatter through the dry coefficient which is the mean of data values below the first quartile from several images. When enough data are available, its use significantly reduces the speckle effect by preserving the spatial resolution. There are two main advantages of using object-based classification: the noise of misclassified pixels (a frequent case in the pixelwise approaches) is reduced to its minimum; the usage of geometrical properties (such as area, width, height, etc.) as additional inputs in the machine learning model is reducing the topographic effects of SAR imaging. The image segmentation was done using Simple Non-Iterative Clustering. The obtained results from the object-based image classification exhibited very satisfactory and accurate results when sufficient data was available. The implementation of a monthly time-step allowed the analysis and the achievement of the complete overview of the deforestation dynamics in the subset AOI for 2020. Few limitations were found in the implementation of this approach. The most important is related to the amount of data that are available: one SAR image could even provide satisfactory results but, as always, the more are the images the better are the results. Moreover, it should be paid great attention, in the parametrization of the approach, to be balance accurate outputs and reasonable demand of computational power. Nevertheless, the applied SAR object-based approach is useful when timely results are needed, also considering that they can be further improved by combining SAR and MSI data.