FOR TROPICAL AGRICULTURAL REMOTE SENSING APPLICATION

The monitoring of agricultural activities at a regular basis is crucial to assure that the food production meets the world population demands, which is increasing yearly. Such information can be derived from remote sensing data. In spite of topic’s relevance, not enough efforts have been invested to exploit modern pattern recognition and machine learning methods for agricultural land-cover mapping from multi-temporal, multi-sensor earth observation data. Furthermore, only a small proportion of the works published on this topic relates to tropical/subtropical regions, where crop dynamics is more complicated and difficult to model than in temperate regions. A major hindrance has been the lack of accurate public databases for the comparison of different classification methods. In this context, the aim of the present paper is to share a multi-temporal and multi-sensor benchmark database that can be used by the remote sensing community for agricultural land-cover mapping. Information about crops in situ was collected in Luís Eduardo Magalhães (LEM) municipality, which is an important Brazilian agricultural area, to create field reference data including information about first and second crop harvests. Moreover, a series of remote sensing images was acquired and pre-processed, from both active and passive orbital sensors (Sentinel-1, Sentinel-2/MSI, Landsat-8/OLI), correspondent to the LEM area, along the development of the main annual crops. In this paper, we describe the LEM database (crop field boundaries, land use reference data and pre-processed images) and present the results of an experiment conducted using the Sentinel-1 and Sentinel-2 data.


INTRODUCTION
Benchmarks are important to make different approaches comparable so that promising strategies can be identified.Fostering the creation of public test beds for new algorithms is a major ISRPS strategic policy (Chen et al., 2016).In recent years, some benchmark datasets have been delivered with ISPRS support (e.g., Nex et al., 2015;Rottensteiner et al., 2014).Several subsequent works were carried out on the basis of such datasets.However, most of them refer to urban areas.To our knowledge, there is only one public multi-temporal, multi-sensor dataset devoted to the assessment of automatic methods for agricultural land-cover classification (Sanches et al., 2018).
Furthermore, most of these publications refer to agriculture in temperate regions.Agricultural land-cover classification in tropical/subtropical regions is comparatively more challenging.Climatic, socio-economic and infrastructure factors make the * Corresponding author crop dynamics in the tropics more complicated and difficult to model (Sanches et al., 2018).
Optical orbital sensors (e.g. the Moderate Resolution Imaging Spectroradiometer -MODIS/Terra and the Operational Land Imager -OLI/Landsat-8) are widely used for mapping and monitoring agricultural activity.However, the limitation of image availability due to cloud cover in optical images is a major issue (Xiao et al., 2018;Whitcraft et al., 2015), mainly for tropical regions (Eberhardt et al., 2016).There are a few ways to work around or minimize the cloud coverage problem.Data from sensors operating in the microwave range (radar), where energy is able to pass through the clouds, can be used.However, radar data is less discriminative than the optical counterpart.Another possibility is to combine data from different sensors.
Although applications combining different sensors are not new, the integration of multi-sensor data is still an active research topic, especially due to the increasing availability of new sensors.In recent years, multi-sensor approaches combining data from optical and radar sensors have been increasingly explored (Reiche et al., 2018) also for monitoring of agricultural targets (e.g.Torbick et al., 2017;Zhou et al., 2017;Navarro et al., 2016).However, to our knowledge there is no publication focused on tropical areas.The Sentinel-1 mission from Copernicus Program opened up new perspectives to explore radar data for crop monitoring.Sentinel-1 is composed of a constellation of two orbiting satellites that have one synthetic aperture radar (SAR) operating in the C-band.It allows obtaining dense time series of freely distributed SAR images, with constant viewing angles (Inglada et al., 2016).This paper introduces a new public database for crop type recognition in a tropical area.It consists of 794 crop field, a sequence of monthly land use reference data and the corresponding pre-processed images along one year.The area corresponds to Luís Eduardo Magalhães (LEM) municipality, Northeast of Brazil.The dataset comprises SAR and optical images at different resolutions, from sensors Sentinel-1, Sentinel-2/MSI and Landsat-8/OLI.
The database is freely accessible at http://www.lvc.ele.pucrio.br/downloads/Databases/LEM/home.html.This paper also presents the results of an experiment, which explores part of the data available in LEM database (C-band Sentinel-1 and Sentinel-2/MSI images) for crop mapping.

LEM Municipality
The LEM municipality is located in the West of Bahia state, in Northeast of Brazil, in the Cerrado Biome (Brazilian Savannah) (Figure 1).LEM is at a latitude of 12°05'31" south and longitude of 45°48'18" west, and it has an area of 3,940.537km 2 with an altitude of 720 m.LEM presents the Tropical Aw climate according to the Köppen-Geiger classification (Peel et al., 2007).The average temperature is 24.2 °C and the average annual rainfall is 1511 mm.The predominant soil in this region is yellow latosol.It is part of the newest Brazilian agricultural frontier known as MATOPIBA, an acronym formed by the initials of Maranhão (MA), Tocantins (TO), Piauí (PI) and Bahia (BA) states.MATOPIBA is being stand out in the production of soybean, maize, cotton and rice, having produced 9.4% of the 2014/2015 Brazilian grain harvest (Portal Brasil, 2015).
It is worth mentioning that LEM became a municipality in 2000, and official agricultural data started being collected in 2001.
Since then the agro-business has progressed remarkably in this area (e.g. increase of area cultivated with cotton and beans, introduction of sorghum, reduction of rice) (IBGE, 2016a).
According to Brazilian Institute of Geography and Statistics (IBGE, 2016b), LEM also presented 1.700 hectares of coffee in 2015 and some other non-meaningful perennial crops.

Field Data Collection
Two field campaigns were conducted in LEM between 26-30 th June 2017 and 14-19 th March 2018, period corresponding of second (dry season) and first (wet season) Brazilian crop harvests, respectively.Data over 700 points was gathered, including geographic coordinates, type of crop and phenology phase and photographs.The team travelled across LEM collecting information about the land use of the agricultural fields during one week in winter 2017 and another week in summer 2018.In each campaign, a mosaic composed by the most recent available OLI/Landsat-8 image was used to navigate along the municipality by using a GPS device connected to a laptop and the Global Mapper software (Global Mapper Software LLC designs, Parker, CO).High resolution images were used as auxiliary data (RapidEye and Google Earth).The focus was on crops, but other classes were mapped as well (e.g.cerrado, pasture).

Radar Images:
A time series of C-band SAR Sentinel-1A images with VV and VH polarizations was acquired, between June 2017 and June 2018, from the Sentinel Scientific Data Hub in Interferometric Wide Swath (IWS) mode, Ground Range Detected (GRD) Level 1 product and were pre-processed using the Sentinel-1 Toolbox 5.0 (Table 2).
The process pipeline involved the application of orbit file, radiometric calibration, terrain correction and linear transformation to dB.During the application of the orbit file, the orbit state vectors provided in the Sentinel-1A metadata, which are generally not accurate, were refined with precise orbit files available days-to-weeks after the generation of the product.Then, digital pixel values were converted radiometrically, and calibrated backscatter to sigma nought calibration coefficient to get the value to the antenna from a unit area on the ground related to ground range.Next, a Range Doppler terrain correction was employed using a Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) to compensate distortions due to image data that is not directly at the sensor's Nadir location.The images were georeferenced to the WGS84 system.Finally, both bands, VV and VH, were scaled to dB.

Crop Field Boundaries
We delimited 794 crop fields selected in situ, using high spatial resolution image data (RapidEye) and a time series of Landsat-8/OLI and Sentinel-2/MSI images.All selected fields were visited in both field campaigns.
The original polygons match the boundaries of the crop fields in the high resolution images (5 m of RapidEye).However, to avoid errors on edge pixels we defined the polygons considering a 60 m wide buffer inside the crop field boundaries (see Figure 2).

Land Use Reference Data
The database contains the land use classes of 794 crop fields (see Figure 3), on a monthly basis, for the period between June 2017 and June 2018.
Based on information collected in situ, optical remote sensing time series images (Sentinel-2/MSI and Landsat-8/OLI) and NDVI profiles (MODIS/Terra), we built monthly references for the visited fields covering one crop year (from June 2017 to June 2018).For June 2017 and March 2018 the reference is based on the field information.The reference for the other months was created by an experienced image interpreter.For the visual interpretation NIR-SWIR-Red false-colour compositions were generated for each date (OLI R5-G6-B4 and MSI R8A-G11-B4).
The land use classes present in the LEM area were: soybean; maize; cotton; coffee; beans; wheat; sorghum; millet (commercial and non-commercial millet); eucalyptus; pasture; hay; grass (areas cultivated with some type of grass, for unknown purpose -hay production, to recover area affected with nematodes etc.); crotalaria; maize+crotalaria (maize cultivated in consortium with crotalaria); cerrado; conversion area (a earlier cerrado field that has been recently deforested, for unknown purposepasture, crop cultivation etc.); uncultivated soil (bare soil, soil with crop residues from previous harvest, and soil with weeds); other non-commercial crops (NCC); and not identified (planting observed in the images between August and November, in areas irrigated by central pivot).Table 3 shows the fields per class distribution over all images in the dataset.

Material and Methods
To illustrate the use of the LEM Database, two experiments were carried out using Sentinel-1 and Sentinel-2 multitemporal image sequences.We selected for the first experiment Sentinel-1 images from June 2017 to May 2018, one image per month (see horizontal plot axis in Figure 4a) in both polarizations, VV and VH.The second experiment was carried out upon Sentinel-2 images from June 2017 to October 2017, one image per month (see horizontal plot axis in Figure 4b) using bands 2 (490 nm), 3 (560 nm), 4 (665 nm) and 8 (842 nm).A subset of the whole reference map was selected for the experiments, comprising approximately 25% of the whole database.
We applied a method known as image stacking.In this approach each pixel is represented by a feature vector formed by staking together the spectral features of all pixels in the same spatial coordinates along the whole sequence.Notice that in this approach, pixels in different epochs at the same image coordinate share the same representation.This brings about a  ×  dimensional feature space for  features per epoch of a sequence comprising  images.For each epoch, we trained a specific classifier to map points in this feature space to the corresponding crop type in that epoch.

Experimental Protocol
In the first experiment on Sentinel 1 data each pixel was represented by a feature vector comprising the backscatter responses in both polarizations, VV and VH, at the same image coordinate along all epochs in the sequence.Similarly, in the second experiment we stacked the spectral responses of Sentinel 2 data in bands 2, 3, 4 and 8 of all pixels at the same spatial coordinate through all epochs to form the pixel-wise feature vectors.
A random forest (RF) classifier was trained upon pixels of randomly selected fields.The RF consisted of 200 random trees with maximum depth equal to 25.As our database is unbalanced, samples of less abundant classes were replicated to obtain approximately 20 000 samples per class in each epoch.Finally, the classifier was tested on sites not used for training.Stratified random sampling from Quantum GIS was applied to take approximately 20% samples for training and 80% for testing from the whole set of fields selected for these experiments.
The experiment was carried out on sequences of different lengths.First, we applied the aforementioned procedure on the earliest image in the sequencea monotemporal scenario.Then, we added the next date image, forming a two images sequence, and applied the protocol to classify the latest one, in this case the second image in the sequence.We repeated this procedure successively by adding one more image to the sequence and classifying the most recent one.In this way, we measure accuracy on each image in the sequence, whereby each result refers to a different sequence length.This protocol represents approximately the problem of successfully refining the estimation of cultivated area for each crop type as time goes on and more images become available.

Results
Figure 4 summarizes the results in terms of overall accuracy (OA) (dark blue bars) and average F1-score (light blue bars) for all sequences considered in the experimental protocol for both sensors, Sentinel-1 (Figure 4a) and Sentinel-2 (Figure 4b).
Largely, accuracy increased as more images were added to the sequence.Notably, the accuracy decreased as we added the 8 th image in the Sentinel-1 experiments (see Figure 4a).The reason for this behaviour is the transition between crop cycles that occurred exactly from the 7 th to 8 th epoch.In this case the data related to a prior, already harvested crop, was not relevant for the recognition of the new crop.Notice that the accuracy increased again from the 9 th epoch on, as images of the new crop cycle were added to the sequence.
Finally, Figure 4 shows a clear superiority of Sentinel 2 (Figure 4b) over the Sentinel 1 data (Figure 4a) in terms of classification accuracy for the same sequence length.This is clearly, due to the richer information provided by optical sensors when compared with the SAR counterpart.

CONCLUSION
This paper introduced the LEM database, built to serve as benchmark for new crop mapping approaches based on multitemporal/multisensor remote sensing data.The database contains land use information of 794 crop fields located in the Luís Eduardo Magalhães municipality, Brazil, which presents crop dynamics typical of tropical areas.The LEM database also contains sequences of pre-processed optical (Landsat-8/OLI and Sentinel-2/MSI) and radar (C band Sentinel-1) remote sensing images.
To exemplify the use of the database the paper also reported experiments using either Sentinel-1 or Sentinel-2 data.Especially in tropical regions, it is virtually impossible to obtain cloud free MSI images with appropriate temporal resolution to cover the entire crop year (both dry and wet periods).Data from Landsat-8/OLI optical sensor (also available in the database) can be taken to partially solve this problem.Although optical sensors generally provide comparatively richer information, we see a trend towards the use of optical and radar sensors together.The LEM database can be useful to the development of new multitemporal and multi-sensor approaches for agricultural landuse classification.

Figure 3 .
Figure 3. Distribution of the fields in LEM.

Figure 4 .
Figure 4. Overall accuracy (dark blue bars) and average F1-score (light blue bars) for different sequences (bar), formed by taking the first image and stacking more images to classify the last image in that sequence.Sequence of (a) C-band SAR Sentinel-1 images and (b) MSI/Sentinel-2 images (right).

Table 3 .
Number of fields per class in each epoch.