LAND-COVER CLASSIFICATION USING FREELY AVAILABLE MULTITEMPORAL SAR DATA (WORK IN PROGRESS)

Synthetic Aperture Radar (SAR) images are a valuable tool for wetlands monitoring since they are able to detect water below the vegetation. Furthermore, SAR images can be acquired regardless of the weather conditions. The monitoring and study of wetlands have become increasingly important due to the social and ecological benefits they provide and the constant pressures they are subject to. The Sentinel-1 mission from the European Space Agency enables the possibility of having free access to multitemporal SAR data. This study aims to investigate the use of multitemporal Sentinel-1 data for wetlands land-cover classification. To perform this assessment, we acquired 76 Sentinel-1 images from a portion of the Lower Delta of the Paraná River, and considering different seasons, texture measurements, and polarization, 30 datasets were created. For each dataset, a Random Forest classifier was trained. Our experiments show that datasets that included the winter dates achieved kappa index values (κ) higher than 0.8. Including textures measurements showed improvements in the classifications: for the summer datasets, the κ increased more than 14%, whereas, for Winter datasets in the VH and Dual polarization, the improvements were lower than 4%. Our results suggest that for the analyzed land-cover classes, winter is the most informative season. Moreover, for Summer datasets, the textures measurements provide complementary information.


INTRODUCTION
Earth satellite images can provide information about great extension and difficult access natural areas. So, they are an essential tool for mapping wetlands. Furthermore, using satellite images is less expensive than fieldwork-based mapping, and they can provide information in different temporal and spatial scales (Brisco et al., 2011).
The Synthetic Aperture Radars (SAR) signal can penetrate through the vegetation and provide information about flood conditions, underneath vegetation biomass, and soil characteristics (White et al., 2015), depending on the sensor and target characteristics. Thus, SAR satellite images are used for mapping and monitoring wetlands (Hess et al., 2003, Arnesen et al., 2013, LaRocque et al., 2020. Other remarkable points of SAR images are that they provide information about the geometric and dielectric characteristics of the observed target and that they can be acquired regardless of the presence of clouds or lighting conditions. The Sentinel-1 mission initiated a new age in SAR systems for earth observation. For the first time, multitemporal SAR imagery from all over the world is freely available. Multitemporal SAR data may provide information about the variation of phenological vegetation states and flooding levels in the studied scene. Ozesmi and Bauer (2002) remark the importance of including multitemporal SAR data in wetlands identification.
Although the multitemporal SAR data can provide information about land-covers seasonal characteristics, it does not capture the spatial variation of a pixel's brightness in an image. Landcover classes may show similar backscatter values statistics; * Corresponding author. however, they may have different spectral within-class spectral variability (Oliver and Quegan, 2004). Previous studies (Numbisi et al., 2019, Caballero et al., 2020 show that texture measurements based on the grey level co-occurrence matrix (GLCM) show potential for land-cover classification. These textures describe the spatial relationship of the image pixel values with their spatial distribution in the landscape (Haralick et al., 1973, Hall-Beyer, 2017b. Remote sensing image classification performance depends on the remote sensing data (selection and manipulation) and also on the classification algorithm (Lu and Weng, 2007). A large variety of classifiers have been explored for land-cover classification using remote sensing data such as Support Vector Machine, Maximum Likelihood, Decision Tree (Otukei and Blaschke, 2010), among others. In the lasts years, Random Forest (RF) became one of the most used supervised algorithms for wetlands mapping (Mohammadimanesh et al., 2018, LaRocque et al., 2020, Mahdianpari et al., 2020 due to the high accuracy of its classifications results, it can handle high data dimensionality and multicollinearity, is fast and insensitive to overfitting (Belgiu and Drȃgu, 2016).
In this work, we consider a portion of the Lower Delta of the Paraná River, a wide coastal freshwater wetland located in Buenos Aires, Argentina. Due to the high amount of biomass in all its extent, mapping and monitoring this area is particularly challenging. The main objectives of this work are: 1. to study the potential of multitemporal Sentinel-1 datasets for land-cover maps in densely vegetated areas, 2. to classify the study area and compare the performance of the different multitemporal Sentinel-1 datasets.
The Sentinel-1 images were processed using the Sentinel's Application Platform (SNAP) (ESA Sentinel Application Platform, 2019). SNAP is an open-source common architecture software for earth observation data manipulation. The Sentinel-1 Toolbox (S1TBX) (included in SNAP) consists of a collection of processing tools to pre-process and analyze data from SAR missions such as Sentinel-1, ERS-1 & 2, ENVISAT, ALOS-PALSAR, TerraSAR-X, COSMO-SkyMed, and RADARSAT-2.

MATERIALS
The present work analysis the capability of different datasets to identify the dominant land-cover classes from the Lower Delta of the Paraná River wetland (Argentina) using Sentinel-1 data.

Study area
The study area is located in the Lower Delta of the Paraná River in Buenos Aires, Argentina (Figure 1), encompassing approximately 117 km 2 (central coordinates: 34.35 • S 58.55 • W). Here the climate is humid and temperate, and the annual mean precipitation is approximately 1000 mm.
The studied area is formed by islands. Here, the accumulation of sediments produced by the Paraná distributary rivers forms levees in their borders. Over these natural levees, Willow (Salix spp.) and Poplar (Populus spp.) forest plantation, fruit orchards, and secondary forests are established. Isolated Ceibos with Cortadera marshes (Scirpus giganteus) in the understory or Ceibo forest (Erythrina crista-galli), can be found toward the interior of the islands. In the island's center, where the soil is permanently saturated, Cortadera (S.giganteus) is the dominant species. Junco (Schoenoplectus californicus) beds produce narrow strip marshes at the edge of watercourses (Kandus et al., 2006).

Data
2.2.1 Remote Sensing data Seventy-six Sentinel-1 images from the period October 2016-April 2019 were used in this study. These images were freely downloaded from the Copernicus website (https://scihub.copernicus.eu/dhus/#/home). The scene product type was Level-1 High-Resolution Ground Rage Detection (GRD) in Interferometric Wide Swath mode; this mode provides VH and VV polarization imagery. All the images were obtained in the descending acquisitions direction with near incidence angles of 29.5 • and far incidence angles of 45.3 • .

Reference data
For this study, we considered a total of five information classes: one corresponding to the water class and four corresponding to the dominant vegetation classes in the area (Ceibo forest, Willow plantation, Cortadera marsh, Junco marsh).
Experts in local vegetation, using two high-resolution images (Planet Team, 2017) and previous studies in the area (Kandus et al., 2003, Kandus and Malvárez, 2004, Kandus et al., 2006, labeled a total of 496 Regions of Interest (ROIs) (Figure 1). For Water, Ceibo Forest, and Junco Marsh classes sixty ROIs were labeled per class. Fifty-eight ROIS were labeled as Cortadera Marsh and Willow Plantation. With each ROI, a unique pixel was labeled in the Sentinel-1 stacks. This analysis was done using QGIS software (QGIS Development Team, 2016).

METHODS
In this section, we describe the Sentinel-1 images preprocessing. We explain how we calculated the GLCM-textures for each image and how we created the different datasets. Then, we describe the Random Forest classifier and the accuracy measurements considered in this work.

Data pre-processing
The imagery pre-processing was done using SNAP and Sen-tinel1 Toolbox; the pre-processing steps included: thermal noise removal, border noise removal, orbit file application, radiometric calibration, speckle filtering (Refined Lee), terrain flattening, and terrain correction, with the final production of a geocoded SAR backscattering gamma 0 coefficient image for each scene (Filipponi, 2019). Then, we created two stacks, one for each polarization.

Extraction of GLCM texture measurements
Additional features were computed from each gamma 0 stack. Image texture measurements provide information about the relationship between the values of each pixel and its neighboring pixels. The GLCM is one of the most used methods to calculate satellite image texture measurements. The GLCM is a The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W2-2021 FOSS4G 2021 -Academic Track, 27 September-2 October 2021, Buenos Aires, Argentina For each processed image, the above-mentioned GLCM textures were calculated using SNAP. For each polarization, a stack containing the four textures was created; the size of each stack is 4x76.

Datasets
After preprocessing the 76 Sentinel-1 images, we developed thirty multitemporal datasets. In the following, we will say that a dataset is Complete if it is formed by the 76 dates. We will refer as a Winter (Spring/ Summer/Autumn) dataset if it is formed by selecting the winter (spring/summer/autumn) dates from the 76 studied images. Table 1 shows the set of dates associated with each season. We will refer to a Dual dataset if it is composed of the VH and the VV polarizations. Each dataset was created selecting a subset of the 76 dates (Complete, Winter, Spring, Summer, or Autumn), a polarization type (VV, VH or DUAL), and using either the gamma 0 backscatter values (Intensity) or using the combination of the gamma 0 with the GLCM textures (Intensity+GLCM).

Algorithm and accuracy assessments
Random Forest (RF) classifier is a widely used algorithm in remote sensing classification tasks (Belgiu andDrȃgu, 2016, Mahdianpari et al., 2017). The main objective of this algorithm is to assign a label to a given input (Breiman, 2001). The RF algorithm is formed by multiple decision trees. The user has to select the number of trees to be used. Then, using the training set (i.e., a labeled sampled set), each tree is constructed by selecting a random subset of attributes and is trained using a random subset of the training samples. The main benefit of these random steps is that this way, the algorithm generates different trees. Therefore, we obtain a decrease in the variance of the RF estimator. Once all the trees are trained, the RF classifier can be used to predict an input class. First, the algorithm predicts the input class in each decision tree and then selects the most repeated class as the RF classifier's predicted class.
All datasets were classified using the RF classifier algorithm. The 496 labeled pixels were randomly divided into two sets: training and test sets. The training set is formed by 296 labeled pixels, and the test set is formed by 200 labeled pixels (40 per class).
Once the RF classifier was trained over the training set, the performance was evaluated over the test set. Classification accuracies were assessed using the Overall Accuracy (OA) and Kappa Index Value (κ) (Congalton and Green, 2005).
The RF classifier, OA, and κ were computed using the implementation provided by the scikit-learn package (Pedregosa et al., 2011) in Python 3.6.

RESULTS
This section presents the results obtained after applying the RF classifier to each multitemporal dataset. Table 2 shows the κ and OA obtained for each dataset.  dates. These overlapping reflect the difficulties to identify the vegetation in the Lower Delta of the Paraná River with Sentinel-1 data. During the winter, where the vegetation is in a leaf-off state, the gamma 0 mean values show a subtle difference. One of the objectives of this study is to understand which Sentinel-1 dataset leads to a better classification in densely vegetated areas: is the dataset associated with a specific season, or is it the Complete dataset?

Temporal analysis
For each polarization, the Summer dataset got the lowest performance values, whereas the highest κ were obtained using the Winter and the Complete datasets (Table 2). In the case of the Intensity+GLCM-Dual data, κ values corresponding to the Winter and Complete dataset differ in less than a 2%. Whereas, in the case of the Intensity-Dual, the Winter and the Complete datasets have the same κ value (0.96). In the case of the Intensity-VH and Intensity+GLCM-VH, the Winter and Complete datasets differ in less than 4%.

Textural analysis
The Intensity+GLCM-VH data using the Complete set of dates had the highest κ (0.98) and OA (98%) (Figure 4c). In contrast, the dataset formed by the Intensity data, in the VV polarization and using the Summer dates, achieved the lowest κ (0.69) and OA (75%).
Incorporating GLCM textures to each Intensity dataset showed an improvement in the κ value in all the studied cases. For Different variables are represented as follows: Intensity elements (red), GLCM Contrast texture (green), and GLCM Correlation texture (purple).
Intensity datasets with κ values higher than 0.90, when we incorporated the GLCM textures, the improvement was less than 3.2%. However, for the Intensity datasets with κ values lower than 0.8, the improvement was higher than an 8.25%.
The κ values from datasets corresponding to the Dualpolarization and the Intensity data oscillate between 0.75 and 0.94. When we incorporate the texture data to these datasets, i.e., the Intensity+GLCM datasets corresponding to the Dual polarization, the performance values were between 0.89 and 0.96. Figure 4 shows the classifications obtained using the Intens-ity+GLCM datasets from the VH polarization data corresponding to the Summer, Winter, and Complete dates, respectively. Winter and Complete datasets classification show a similar spatial pattern of the land-cover classes. Summer dataset classification shows a much noisier pattern.

Variable importance analysis
RF classifier enables to estimate the input datasets variable importance, i.e., to score the dataset variables according to its usefulness in predicting the target classes (Breiman, 2001). The classification with the highest κ and OA results was the one obtained using the Complete Intensity+GLCM values in the VH polarization dataset. So, this dataset was selected for further analysis. Figure 3 represents

CONCLUSIONS
This research aimed to identify the potential of Sentinel-1 data for creating a thematic land-cover map in the Lower Delta of the Paraná River. Based on the 76 dual polarized Sentinel-1 images, and considering different seasons, texture measurements, and polarization, 30 datasets were created. Each dataset was classified using the RF classifier. Then, the classification performances obtained by the different datasets were compared.
The experiments indicate that including GLCM-texture measurements and winter dates got higher κ and OA performance values.
Whereas this study illustrates the potential of including multitemporal GLCM textures in the datasets, it also raises a question on the contribution of the GLCM-Entropy texture. To better understand the implications of these preliminary results, future studies could address studying the potential of each GLCMtexture in the area and replicating the analysis in other study areas.