POLARIMETRIC SAR DATA FROM SENTINEL-1A APPLIED TO EARLY CROP CLASSIFICATION

This paper aims to map crops in two Brazilian municipalities, Luís Eduardo Magalhães (LEM) and Campo Verde, using dualpolarimetric Sentinel-1A images. The specific objectives were: (1) to evaluate the accuracy gain in the crop classification using Sentinel-1A multitemporal data backscatter coefficients and ratio (σVH, σVV and, σVH/σVV, denominate BS group) in comparison to the addition of polarimetric attributes (σVH, σVV, σVH/σVV, H, and α, denominate BP group) and; (2) to assess the accuracy gain in the earliest crop classification, creating new scenarios with the addition of the new SAR data together with the previous images for each date and group (BS and BP) during the crop development. For BS and BP groups, 13 e 10 scenarios were analyzed in LEM and Campo Verde, respectively. For the classification process, we used the Random Forest (RF) algorithm. In the LEM site, the best results for BS and BP groups were equivalent (overall accuracy: ~82%), while for the Campo Verde site, the classification accuracy for the BP group (overall accuracy: ~80%) was 2% higher than the BS group. The addition of new images during the crop development period increased the earliest crop classification overall accuracy, stabilizing from mid-February in LEM and mid-December in Campo Verde, after 10 and 8 images, respectively. After these periods, the gain in classification accuracy was small with the addition of new images. In general, our results suggest the backscattering coefficients and polarimetric attributes extracted from the Sentinel-1A imagery exhibited a great performance to discriminate croplands.


INTRODUCTION
Brazil occupies the first positions in the world ranking of agricultural production of soybeans, corn, coffee, cotton, among others (FAOSTAT, 2020). Thereby, knowing where and which crops are present in the fields is useful in regional and global scales (Kussul et al., 2016;McNairn et al., 2014). This information is crucial for crop management, food security assurance and agricultural policy design (Arias et al., 2020;McNairn et al., 2014).
Crop classification is one of the most important agricultural applications of Remote Sensing (RS) (Atzberger, 2013). RS provides synoptic character and data acquisition promptness highlighted as a technique to map and monitor the crops throughout their lifecycle (Oldoni et al., 2019a;Prudente et al., 2019a;Veloso et al., 2017). Moreover, early or in-season crop information allows critical support for timely crop yield and production estimates (You and Dong, 2020).
While RS data are essential for agricultural mapping and monitoring, to provide accurate crop information is a challenging step due to the high spatio-temporal dynamics (Denize et al., 2019;Hütt and Waldhoff, 2018). Besides, optical cloud-free image acquisition is a challenge too (Whitcraft et al., 2015b;Wulder et al., 2015). Because of these factors, some regions might have only a few cloud-free optical RS data available during the crops vegetative cycle, especially at the end of flowering and beginning of plant senescence (Eberhardt et al., 2016;Whitcraft et al., 2015a). Thus, the use of optical RS data is even more sensible in tropical, subtropical, and northern regions (Evans and Costa, 2013), where the cloud cover frequency is higher.
Therefore, to overcome this cloud limitation, the Synthetic Aperture Radar (SAR), can be a useful alternative. SAR data collection can be performed under almost all-weather conditions, independently of lighting conditions and are little affected by atmospheric and cloud cover (Arias et al., 2020;Harfenmeister et al., 2019). For this reason, it is possible to obtain a dense temporal series of useful SAR data, even in areas with high cloud cover, such as tropical regions.
However, SAR data interpretation is complex, with many factors influencing the interaction between plant and microwave beam. The returned SAR signal is influenced by canopy properties, like dielectric constant (or water content), geometry, orientation, topography and surface roughness, besides by the radar system features, like frequency, polarization, spatial resolution, incident angle, orbit, etc (Harfenmeister et al., 2019;Steele-Dunne et al., 2017).
For example, for the cropland, each radar polarization is more sensitive to certain characteristics of vegetation. In a SAR system, microwave beams vertically (V) polarized are more sensitive to the vertical structure, while the horizontally (H) polarized waves are more sensitive to horizontal structures (Denize et al., 2019;McNairn and Brisco, 2004). The copolarized HH waves demonstrate more sensitivity to surface scattering, while, the cross-polarization, VH, and HV, are more sensitive to volume dispersion (Denize et al., 2019;Veloso et al., 2017).
Moreover, radars operating at longer wavelengths (L and P bands) penetrate inside the canopy, while those of shorter wavelengths (C and X bands) interact with the upper layers of the canopy. These factors determine the radar backscatter from vegetation and they are crop-specific and vary as crop phenology changes, thus, there is significant potential for the use of SAR in agricultural applications, like classification, crop monitoring, and soil/vegetation moisture monitoring (McNairn and Brisco, 2004;Steele-Dunne et al., 2017).
Besides the backscattering, information can be extracted from SAR data using the Polarimetric SAR (PolSAR) technique. This method uses information about the amplitude and phase of the signals reflected in different polarizations and stored in a complex vector structure (Ulaby, Fawwaz and Elachi, 1990). From the PolSAR technique, it is possible to use the target decomposition theorems (Cloude and Pottier, 1996). The decomposition theorems separate the scattering mechanisms of different natures, making it possible to infer the physical properties of the targets. It also can help improve segmentation, classification, target detection, and pattern recognition.
The H/α decomposition (Cloude and Pottier, 1996) allows obtaining the polarimetric attributes of entropy (H) and alpha angle (α). Entropy indicates the number of dominant scattering mechanisms, making it possible to assess the degree of randomness of the scattering process (Cloude and Pottier, 1996). The H polarimetric attribute ranges from 0 to 1. Low H values (~ 0) indicate weak polarization, suggesting the presence of a single scattering mechanism. On the other hand, high H values (~ 1) indicate the presence of at least three targets that also contribute to depolarize electromagnetic waves. The alpha angle identifies the type of scattering, which varies from α = 0° characterized as superficial scattering, α = 45° volumetric scattering and, α = 90° double-bounce scattering (Cloude and Pottier, 1996).
In this sense, free polarimetric SAR data from Sentinel-1A and 1B satellites, which are available by European Space Agency (ESA), represent a new opportunity to investigate agricultural monitoring methods based on dense SAR time series (Denize et al., 2019), since Sentinel-1 has a high temporal resolution (12 days Sentinel-1A and, 06 days when considering Sentinel-1A and 1B) and a spatial resolution around 10 m (ranging from 5 to 40 m, depending on the acquisition mode). In this context, two specific goals were defined: (1) to evaluate the accuracy gain in crop classification by Random Forest algorithm using Sentinel-1A multitemporal data backscatter coefficients (σ 0 ) in the polarizations VH (σ 0 VH), VV (σ 0 VV) and the ratio between them (σ 0 VH/σ 0 VV), in comparison to the addition of polarimetric attributes (H and α), in two test sites in Brazil and; (2) to assess the accuracy gain of the earliest crop classification with the integration of additional SAR dates in the polarimetric attributes and backscatter coefficients.

Study area
Two municipalities were selected in Brazil as test sites for this study (Figure 1). One is Luís Eduardo Magalhães (LEM), located in western Bahia state, and the other is Campo Verde, located in the southeastern of the Mato Grosso state. Both areas are inserted into the Cerrado biome (Brazilian Savanna).
The LEM central coordinates are 12º05'31" S and 45º48'18" W ( Figure 1a). It comprises an area of 3,940.5 km², with an altitude of 720 m and presents tropical wet and dry climate (Aw), according to the Köppen-Geiger classification (Peel et al., 2007). The average annual rainfall is 1,511 mm and the average temperature is 24.2ºC. This municipality is inserted in the MATOPIBA economic region, which is an important agricultural frontier that comprises parts of Maranhão, Tocantins, Piauí, and Bahia states. The LEM agricultural production is based mainly on soybean (179,540 ha), corn (19,136 ha), sorghum (14,000 ha), cotton (12,100 ha), and beans (7,910 ha); these numbers correspond to the planted area of each culture in 2018 according to the Municipal Agricultural Production survey (PAM, 2018), held by the Brazilian Institute of Geography and Statistics (IBGE). Due to this agriculture importance, LEM has the 4th higher Gross Domestic Product (GDP), BRL 601,242.05 among Bahia municipalities (IBGE, 2017).  (Peel et al., 2007). This municipality has an economic profile based on agribusiness, and according to the PAM survey (PAM, 2018), in 2018 Campo Verde was the 10th Brazilian municipality with the highest value of agricultural production (BRL 1,905,073). The most cultivated crops were soybean (210,000 ha), corn (88,500 ha), cotton (85000 ha), beans (3800 ha), and sorghum (2,000 ha).
Besides, there are also other types of crops, livestock, and forest production (eucalyptus).
In LEM and Campo Verde, two harvests are normally grown in one year. One during the raining season (or wet period), from October until March, that represents the first crop season in the agricultural calendar. The second one occurs in the dry period, from March until August, corresponding to the second crop season.

Field Data
Field reference is essential for remote sensing applications. For the present work, we used two available datasets, the LEM Database (Sanches et al., 2018a) and the Campo Verde Database (Sanches et al., 2018b). These databases provide field references about croplands and other land cover types. For our analysis, It is important to highlight that Soybean and Corn represent the highest value of agricultural production for both municipalities (PAM, 2018). For this reason, each one was separate in one single class and the other crops were grouped into the OAC class.

Sentinel-1A data and pre-processing
Sentinel-1A images were acquired in IW (Interferometric Wide swath) mode, with a spatial resolution of 5m x 20m, dual-pol polarization VH and VV, and Level-1 Single Look Complex (SLC) processing level. The images were downloaded through the Copernicus Open Access Hub, also known as the Sentinels Scientific Data Hub (https://scihub.copernicus.eu/), with an open-source toolbox in a python routine, named SentinelSat (https://github.com/sentinelsat/sentinelsat) (Hu et al., 2018). In total, 13 images were downloaded for LEM (03-11-2017 to 03-27-2018, scene T083), and 19 images for Campo Verde (05-10-2015 to 03-09-2016, 10 images for T090 scene and 09 images for T091 scene). These intervals were used to cover all the first crop season. The images from each scene were pre-processing separately. Only after the pre-process, it was performed the mosaic from the Campo Verde scenes (with the exception of a date that does not require a mosaic).
To pre-process Sentinel-1A data, the following steps were defined ( Figure 2a): TOPSAR Split to select only the bursts that covered the study area; Orbit File: to correct the orbit; TOPSAR Deburst: to join the sub-swaths; Multilook: to resample the SAR images by a spatial average, with a window size of 4x1 pixels (pixel size converted to 14.04 m by 13.47 m, respectively, in azimuth and range directions). These steps were performed for all polarimetric attributes and backscattering coefficients.
After performing the above mentioned steps (Figure 2a), to obtain the backscattering coefficients (σ 0 VH and σ 0 VV) ( Figure  2b), SAR images were filtered with Boxcar Filter with a window size of 7x7 pixels, to reduce the speckle effect. In sequence, the radiometric calibration step of SAR data was performed. Afterward, terrain correction was applied using the SRTM (Shuttle Radar Topography Mission), this process generated images with a spatial resolution of 14.04 m. Finally, the ratio between σ 0 VH/σ 0 VV was calculated.
In turn, to obtain the polarimetric attributes, H-α polarimetric decomposition was processed (Figure 2c), generating the entropy (H) and alpha angle (α). In sequence, SAR images also were filtered with Boxcar Filter with a window size of 7x7 pixels, and terrain correction was applied using the SRTM. All the preprocessing steps ( Figure 2) were performed using the Sentinel Application Platform (SNAP) 7.0, and python routines. Figure 2 Pre-processing steps for each Sentinel-1 image: (a) pre-processing to both polarimetric attributes and backscattering coefficients; (b) backscattering coefficients; (c) polarimetric attributes.
For both groups, BS and BP, the accuracy during the period of crop development were simulated. For this, classifications were made considering BS and BP for each date, which included all images from the specific date and previous dates. Thus, a total of 26 classifications scenarios were performed for LEM (13 scenarios to BS group, and others 13 scenarios to BP group), as shown in Figure 3a. For Campo Verde, a total of 20 classifications scenarios were performed (Figure 3b). For the first scenario (scenario 1) in LEM and Campo Verde, 3 attributes were used for BS (σ 0 VH, σ 0 VV, σ 0 VH/σ 0 VV), and 5 for BP (σ 0 VH, σ 0 VV, σ 0 VH/σ 0 VV, H and α). On the other hand, in LEM, 39 attributes were used for scenario 13 BS (the 3 attributes of each date x 13 dates), and 65 attributes for scenario 13 BP (the 5 attributes of each date x 13 dates). The Random Forest (RF) algorithm was used due to excellent results for crop classifications and high velocity of processing (Breiman, 2001;Du et al., 2015;Pal, 2005;Rodriguez-Galiano et al., 2012). Besides that, RF does not assume a priori statistic distribution, allowing the use of radar data (Furtado et al., 2016). The RF was implemented using the machine learning library for Python, Scikit-Learn (Pedregosa et al., 2011). For each scenario of each group, a stack of data was created and insert to RF for classification.
For training and validation, the cross-validation method was used. 2/3 of the polygons were randomly divided for training and 1/3 for validation in 3 repetitions. Thus, pixels of the same polygons were not used simultaneously for training and validation. To train the RF algorithm, 60,000 pixels of each class were randomly selected from the training polygons. From the accuracy assessment, producer (PA) and user (UA) accuracies, and overall accuracy (OA) were calculated (Congalton, 1991;Olofsson et al., 2014). The time elapsed for classification was also recorded to analyze the viability of polarimetric process time.

RESULTS
For LEM, the best classification result was achieved just before the end of the soybean cycle (03-15-2018), corresponding to scenario 12 in BP, as can be observed in Figure 4a. Despite this, the results have started to stabilize in scenario 10 (Figure 4a), four images before the soybean cycle end, with overall accuracy (OA) around 79~82% in both groups, BS and BP. At this time/scenario, the soybean was in the period between the maximum vegetative vigor and the senescence beginning. After that, scenario 11 to scenario 13, the accuracy increase was small, but the processing time was considerably longer. For Campo Verde, the better OA obtained, around 78~80%, was achieved using the BP group, in the scenarios 8 and 10 ( Figure  4b). However, the processing time for both scenarios was longer, ~100 seconds, when compared to the same scenario formed by BS. As in LEM, in Campo Verde, the OA also stabilized during the period of maximum vegetative vigor and the beginning of crop senescence.
In general, for the LEM test site, although the best accuracy was obtained from the BP group, it was almost equal to that obtained for the BS group. In Campo Verde, the result obtained with the BP group was about 2% higher than the one obtained from BS for the 12 scenarios analyzed (exception scenario 1). Also, the scenarios formed with the largest number of images did not present much higher classification results. In LEM, for the 11, 12, and 13 scenarios, in the BS and BP groups, the OA was around 80%, reaching 82% in the 12 BP scenario. The same happened to Campo Verde, where the BS and BP 8, 9, and 10 scenarios showed the OA ~80% too.
In some cases, for LEM, during the soybean development period, the accuracy of the classification using the BS group overcame the result from BP. Moreover, the time taken to classify both datasets was longer for the BP group.
The UA and PA accuracies best scenario for LEM and Campo Verde are presented in Figures 4a and 4b  For LEM site, besides the lowest accuracy observed for Pasture class, 14% of the pixels of Others class were erroneously classified as Pasture. In Campo Verde, the biggest confusion also occurred with the Pasture class, 45% of the pixels were erroneously classified as Others, and 15% of the pixels Other class were erroneously classified as Pasture. Also, 10% of the Soybean pixels were misclassification with the Pasture. Figure 6 shows the maps of the best classification results for the two test sites. It is possible to observe isolated classified pixels, this occurs mainly because of the effect of speckle, which was not reduced in the pre-processing step. The effect of speckle produces a grainy appearance in SAR images. This effect occurred more intensely in Campo Verde compared to the LEM.

DISCUSSION
The results show better classification accuracy for the combination of backscatter coefficients with metrics derived from polarimetric decomposition -BP group. In the LEM test site (Figure 4a), the difference between the BS and BP groups was inferior compared to Campo Verde (Figure 4b). The processing time was longer for the classification using all metrics (BP) in comparison to only using backscatter coefficients (BS), being a trade-off about increased accuracy (Figure 4).
In both test sites, classification accuracy was low at the beginning of the crop cycle. During this period, plants are beginning to emerge above the surface, and there is a greater influence of soil surface on the backscattering response, affecting the overall results of the classification.
However, during crop growth, the vegetation tends to attenuate soil backscatter, because the ears, leaves, and stems are developing. The backscatter at C-band is sensitive to changes in plant development (McNairn and Shang, 2016), so it is expected that the different scenarios, formed by multitemporal SAR images detect the changes in the crop's structure, and improve the separability in the classification. During the phenological development of crops, such as soybean and corn, there is a greater contribution, especially of cross-polarization VH, of the ratio VH/VV, and H polarimetric attributes, that are more sensitive to volumetric scattering (Denise et al., 2019). The polarization VV demonstrates more sensitivity to vertical structures, like the stems of the plants.  Oldoni et al. (2019b) demonstrated the greatest capabilities to accompany the growth of crops with the backscattering coefficients, σ 0 VV and σ 0 VH. The authors observed that both coefficients have increased during the beginning of soybean growth. Furthermore, the polarimetric attributes (H and α) demonstrated the potential for discrimination classes corn, soybean, and cotton in specific crop cycle periods. Arias et al. (2020), used Sentinel-1 multitemporal images, in VH and VV polarizations, to classify crops. Both polarizations demonstrated sensitivity to many phenological events, as the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B3-2020XXIV ISPRS Congress (2020 structural characteristics of the canopy varied. The ratio (VH/VV) showed more sensitivity to vegetation growth, which allowed determining whether a field was vegetated or not but providing poor information about crop identification. Sanches et al. (2018a) used the Sentinel-1 SAR image to provide the crop classification in LEM, using a different approach. They used one image per-month, between June-2017 and May-2018. According to these authors, the accuracy was also improved as more Sentinel-1 images were added. However, the maximum accuracy for them was obtained in November-2017 (OA near 85%) and decrease with the image from January-2018 (OA under 50%). Sanches et al. (2018b), used the backscatter coefficients of Sentinel-1, in VV and VV polarizations to classifying all classes of the Campo Verde Database. Their results showed an increase in accuracy from the beginning of the summer crop cycle. The overall accuracy stagnated after December when it came close to 65%. A similar result was reported in this research when the classification overall accuracy stabilizing in mid-December in Campo Verde.
In this research, it was observed that the accuracy increases until the crops stabilize, about 3.5 months and 2.5 months after the average start of the cycle in LEM and Campo Verde, respectively. After the crop periods of stabilizing, it was observed in LEM, that the results of classifications remaining constant close to 80~82% overall accuracy (Figure 4a), and in Campo Verde, remaining constant close to 80% overall accuracy ( Figure 4b).
This demonstrates the importance of acquiring multitemporal SAR images to maximize the accuracy gains of the results of the classification, especially of land use typologies that present a greater dynamic in space and time, such as crops. Tomppo et al., (2019) affirm that the multitemporal SAR image acquisition during the seed and reproductive phenology phases of the plant can improve the results accuracies of classifications of crops.
In general, the Soybean class obtained the best producer and user classification accuracies ( Figure 5) after the moment when overall accuracy stabilized (scenarios 10 to 13 for LEM, and 7 to 10 for Campo Verde - Figure 4). The second best classification results were obtained for OAC class, with producer and user accuracy larger than 70%.
On the other hand, the Pasture class had the worst producer and user accuracy ( Figure 5). Its main misclassifications occurred with the Others class, wherein Cerrado is included in this class. Cerrado is also formed by natural pasture formations. These formations have a structure like the pasture used for cattle, which explains the confusion. Prudente et al. (2019b) also identified a similar problem. Another confusion erroneously classified that occurred was OAC with Soybean and Corn ( Figure 5). The OAC class is formed by crops such as beans and sorghum, which have structures similar to soybeans and corn, respectively. This may explain the misclassifications among these classes.
Although the classification results of the BS and BP groups have shown similar discriminatory potential, it can be observed in Figure 4b, for the Campo Verde test site, that for some scenarios (2-5) the addition of information from the polarimetric attributes increased the general accuracy of the classifications. The addition of the signal phase information (complex data), possibly made it possible to separate the scattering mechanisms of different natures, which may have helped in the discrimination of the targets. In this way, it is possible to note that these parameters can be used for improving the accuracy of crop classification, although more studies are needed to understand the gain of each crop.

CONCLUSIONS
In LEM test site, the best results were equivalent with or without the addition of polarimetric metrics (H and α) to the metrics obtained from backscatter coefficients (σ 0 VH, σ 0 VV, σ 0 VH/σ 0 VV). In the Campo Verde test site, the addition of polarimetric metrics showed an overall accuracy of around 2% higher than the non-use of polarimetric metrics, except for the first scenario.
The addition of new images over the crop development period increases overall accuracy, stabilizing from mid-February in LEM, and mid-December in Campo Verde. After these periods, the gain in accuracy is small with the addition of new images. For future works, it is recommended to evaluate the use of segmentation or the combination with data obtained by optical sensors to improve the results. Besides, it is also recommended to carry out data mining to assess the performance of each polarimetric attribute and backscatter coefficient into each class, to form optimal sets that allow for better discrimination of different classes of land use and land cover.