FIRST RESULTS OF THE LEM BENCHMARK DATABASE FOR AGRICULTURAL APPLICATIONS

Applying remote sensing technology to map and monitor agriculture and its impacts can greatly contribute for the proper development of this activity, promoting efficient food, fiber and energy production. For that, not only remote sensing images are needed, but also ground truth information, which is a key factor for the development and improvement of methodologies using remote sensing data. While a variety of images are current available, inclusive cost-free images, field reference data is scarcer. For agricultural applications, especially in tropical regions such as Brazil, where the agriculture is very dynamic and diverse (recent agricultural frontiers, crop rotations, multiple cropping systems, several management practices, etc.), and cultivated over a vast territory, this task is not trivial. One way of boosting the researches in agricultural remote sensing is to stimulate people to share their data, and to foster different groups to use the same dataset, so distinct methods can be properly compared. In this context, our group created the LEM Benchmark Database (a project funded by the ISPRS Scientific Initiative project 2017) from the Luiz Eduardo Magalhães (LEM) municipality, Bahia State, Brazil. The database contains a set of pre-processed multitemporal satellite images (Landsat-8/OLI, Sentinel-2/MSI and SAR band-C Sentinel-1) and shapefiles of agricultural fields with their correspondent monthly land use classes, covering the period of one Brazilian crop year (2017-2018). In this paper we present the first results obtained with this database. * Corresponding author


INTRODUCTION
Brazil is a world leader in food production, and agriculture is historically one of the principal bases of the country's economy, providing food, fiber and sugarcane ethanol. However, the great territorial extension of Brazil, the regional diversity of physical aspects such as climate, soil, vegetation cover and water availability, and the dynamism of Brazilian agriculture (multiple harvests, new agricultural frontiers, diversity of management practices, etc.), represent a challenge for monitoring the agricultural activity (Formaggio and Sanches, 2017). In this scenario, remote sensing can be very useful, since it provides a synoptic view of the surface and repeatability of covering (systematic monitoring) throughout the development of the entire crops' cycles along the whole crop year.
For the monitoring of the agricultural activity by remote sensing, five key questions are sought to be answered: 1) Where is it growing? 2) What is growing? 3) How is it growing? 4) How much is it growing? 5) Indeed, remote sensing has been used in many tasks (Weiss et al., 2020), such as mapping the cultivated areas (e.g. Griffiths et al., 2019;He et al., 2019), monitoring the expansion of agriculture, identifying crop types (e.g., soy, maize, cotton), monitoring the development of plantations, detecting pests and diseases, plant stress (e.g., water, nutritional), weed detection (e.g. Castro et al., 2019), estimating crop area, biomass and yield (e.g. Liao et al, 2019), and tracking the impact of agricultural production on water resources, soil, native vegetation, and carbon emissions, and other applications (e.g. Breunig et al., 2020).
Currently, the large availability of remote sensing data (era of big data - Liu, 2015;Huang et al., 2018), from multi-platforms (satellites, aircrafts, Unmanned Aerial Vehicles -UAVs), from multi-sensors (passive and active sensors; low, medium and high spatial resolutions sensors; multispectral and hyperspectral sensors), combined with the development of more robust methods for image processing (e.g. cloud computing - Ma et al., 2015), and image analysis, such as machine learning methods (Lary et al., 2016) (e.g. neural networks - Atkinson and Tatnall, 1997;Random Forest -Belgiu and Dragut, 2016) sets up a scenario quite promising for applications in several areas, including the agricultural activity.
Nonetheless, the lack of ground truth samples to train and test methodologies is a major obstacle for the successful use of remote sensing, especially for agricultural applications (targets are dynamic - Sanches et al., 2019). This task can be time consuming and costly and demands the intervention of agricultural specialists for the field data collection.
Fortunately, there have been some initiatives of sharing agricultural ground truth data, but a lot more need to be done in this area. For example, Sanches et al. (2018b) build a database for Campo Verde municipality, located in Mato Grosso State, Brazil. Maize, soybeans and cotton are the predominant crops cultivated in this area. The information of over 500 fields are provided, for the period between October 2015 and July 2016.
In this context, the "LEM benchmark database" was created in the project "Benchmark database for tropical agricultural remote sensing application", that was funded by the ISPRS Scientific Initiatives -2017. The idea was to make a database available to foster the development of remote sensing for agricultural applications in tropical areas, following the example of other initiatives (e.g. Rottensteiner et al., 2014 -urban applications). The target area was the municipality of Luiz Eduardo Magalhães (LEM), in the Cerrado biome (Brazilian Savanna), situated in the newest Brazilian agricultural frontier known as MATOPIBA, an acronym formed by the initials of the states of Maranhão (MA), Tocantins (TO), Piauí (PI) and Bahia (BA). MATOPIBA has been standing out in the production of soybean, maize, cotton and rice. LEM is frequently covered by clouds, what makes optical remote sensing very challenging, stimulating the use of microwave remote sensing (radar).
In the present paper, the LEM benchmark database is briefly described, followed by some case studies that have explored this database. Finally, the LEM Challenge, which is one initiative to boost the use of the database, is presented. The strengths of LEM database are threefold: i) it provides information from an important tropical agricultural area (West of Bahia State, Brazil); ii) the reference data covers an entire crop year -2017/2018 (monthly land use classes are provided); iii) it is a free available database.

LEM DATABASE
The LEM benchmark database (Sanches et al., 2018a) was created between June 2017 and June 2018. The project´s objectives were: i) Collect in situ information about crops (and other land use classes), including geographic coordinates, crop type and phenology phase, in Luís Eduardo Magalhães municipality ( Figure 1) situated at latitude 12°05'31'' south and longitude 45°48'18'' west, during two dates (first and second harvests of one Brazilian crop year).
ii) Acquire a set of multi-temporal remote sensing images, from active and passive orbital sensors, free available for the study area, covering the period of developing of the first and second harvests crops.
iii) Create monthly reference maps for the study area, based on visual interpretation of optical remote sensing images and field data, for the period of development of the main annual crops found in the area. iv) Create a database containing the boundaries of the crops fields, which were selected in the field campaigns, the monthly reference maps (land use classes), and the multi-temporal images pre-processed.
To achieve the proposed objectives: i) One field campaign was conducted in LEM municipality during the second harvest, between 26-30th June 2017, and another one during the first harvest, between 14-19th March 2018.
iii) Based on information collected in situ, together with optical remote sensing time series images (Sentinel-2/MSI and LANDSAT-8/OLI) and NDVI profiles (MODIS/TERRA), an experienced interpreter created monthly field references maps (for the fields visited in situ) covering one Brazilian crop year (June 2017 -June 2018).
iv) The LEM database was created containing: 24 dates Landsat-8/OLI images, 58 dates Sentinel-2/MSI images and 30 dates Sentinel-1 images (Table 1)   Nineteen land use classes were mapped in LEM (considering the fields visited in situ). Some of these classes (e.g soybeans, hay, millet, sorghum, pasture, cotton, maize and coffee) are illustrated in Figure 2 and Figure 3, with photographs taken during the two field campaigns.

CASE STUDIES USING THE LEM DATABASE
The LEM database has been used in several academic studies. Some researches have focused on only one type of sensor, others have explored both optical and SAR data, and there were cases in which only the field information was used as auxiliary data. In the following, we present seven case studies that use the database. Some of them are part of master dissertations (headings 3.2.2 and 3.2.3) and PhD thesis (headings 3.1.2 and 3.3.1).

Case Studies using both Optical and Radar data
3.1.1 First experiment published using the LEM benchmark database for tropical agricultural remote sensing application: The first results from the use of the LEM benchmark database were published in the ISPRS Archives along with the description of the database (Sanches et al., 2018a).
Two experiments were carried out using part of the database, focusing on the Sentinel images, one for Sentinel-2/MSI and other for SAR band-C Sentinel-1. The experiments simulated the estimation of crop area along the development of the plants. First, only one image was used in the analysis (monotemporal scenario), corresponding to the earliest image in the sequence (before the crops were sow/planted). Then, more images were progressively added to classify the last image in the sequence (multitemporal scenarios). Therefore, the experiments tested several sequences of different lengths.
For Sentinel-1 image analysis, sequences of backscattering responses in VV and VH polarizations were used. For Sentinel-2, the bands with 10 m of spatial resolution, which correspond to bands 2 (490 nm), 3 (560 nm), 4 (665 nm) and 8 (842 nm), were explored. Image stacking, with one image per month, was applied and the analysis was performed on pixel-wise feature vectors using the Random Forest classifier. Better results in classification accuracy were obtained when the optical data (Sentinel-2) was used, in comparison to the SAR data (Sentinel-1). Overall, classification accuracy increased as more images were added in the sequence. But this is not only related to the number of images per se, it also depends of the image date and the land use classes been mapped, since crops and other land use classes have different dynamics.
In these experiments, the Sentinel-1/MSI and SAR band-C Sentinel-2 data were analysed separately. Although the optical data might provide more discriminative information, as showed in the experiments, they are highly affected by cloud cover, especially during the main Brazilian harvest. Thus, the combination of optical and SAR data should be explored as an alternative.

Crop recognition in tropical regions based on spatiotemporal conditional random fields from multi-temporal and multi-resolution sequences of remote sensing images:
Achanccaray (2019) employed sequences of SAR Sentinel-1 and Optical Landsat-8 images from December 2017 to June 2018 to recognize all different crops present in the LEM database. For this purpose, the author proposed a multi-temporal and multi-resolution conditional random field (CRF) approach exploiting information in both domains, spatial and temporal.
The CRF approach consists of three main terms called association, spatial interaction and temporal interaction potentials. The association potential relied on the local posterior probabilities obtained by the last layer of a Convolutional Neural Network (CNN) trained upon the stack of features extracted from the sequences of SAR (VV and VH polarizations) and Optical images (bands from 1 to 5 and 7). The spatial interaction potential was represented by a contrastsensitive Potts model that depends on the data of spatially neighbouring pixels. Finally, the temporal interaction potential introduced expert knowledge about possible and non-possible transitions between crops over time. Additional connections were included between images with higher spatial resolution to The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B5-2020XXIV ISPRS Congress (2020 deal with the high dynamics between neighbour fields in the spatial and temporal domains and the different spatial resolutions available in the dataset. The proposed model achieved accuracies of up to 92% in terms of Overall Accuracy and 83% in terms of F1-score for the selected period from December 2017 to June 2018.  (Palazzo et al., 2018). The classifications were done separately for the two harvests, using SAR image stacks of VV and VH polarizations.

Case
The best classification accuracies were obtained for cotton and soybean. Overall, the classification accuracies of the crop's targets were higher for the first harvest, which can be explained by the fact that a greater variety of crops are cultivated in LEM during the second harvest, while soybean is the dominant crop in the first one. Confusion among maize -sorghum -millet and among coffee -eucalyptus -cerrado classes were observed during the period of the second harvest.
As future works, Prudente et al. (2019) recommended to explore metrics such as standard deviation and amplitude from the SAR image series; and to improve the classification of perennial classes (e.g. coffee), to use longer image time series.

3.2.2
Many-to-many fully convolutional recurrent networks for multitemporal crop recognition using SAR image sequences: As stated before, tropical regions, as in the LEM database, present more complex spatio-temporal dynamics than temperate regions, where there is a single harvest per season. In this sense, Chamorro et al. (2019) adapted two recurrent neural networks (RNNs), originally conceived for single harvest per season, for multidate crop recognition. In addition, a novel multidate approach based on bidirectional fully convolutional recurrent neural networks was proposed.
These three architectures were evaluated on public Sentinel-1 data sets from two tropical regions in Brazil, being the LEM database one of them. In the experiments, all methods achieved state-of-the-art accuracies with a clear superiority of the proposed architecture, which was the best performing method in terms of F1-score for most crops and dates on both regions.

Exploring radar images with different polarizations for crop type classification:
Oliveira, V.M is pursuing a master degree in Remote Sensing at INPE. Her dissertation proposal research is still under development, but the general idea is to explore different radar images (e.g. full polarimetric versus single and dual polarization; band-C versus band-L) to map crops. The study area is the LEM municipality. Beyond the Sentinel-1 images and the field reference of the LEM database, images from RADARSAT-2 and ALOS/PALSAR will be acquired, and other field campaigns will be carried out in LEM municipality. The thesis has three phases. First, the development of a procedure for data harmonization of the OLI, MSI and WFI sensors, in order to generate a single consistent time series. Then, from the combination of spectral, statistical and phenological metrics extracted from the harmonized data of the three sensors, the hierarchical classification of agricultural areas and types of crops will be carried out using the Random Forest classifier. Last, metrics extracted from harmonized Satellite Image Time Series (SITS) from OLI, MSI and WFI sensors will be used in machine learning algorithms to estimate soybean productivity in one of the study areas (Paraná). The study was based on time series of Landsat-5/TM, Landasat-7/ETM+ and Landsat-8/OLI images and relied on visual image interpretation and Geographic Object-Based Image Analysis (GEOBIA) (Baatz et al., 2000).

LEM CHALLENGE
To boost the use of the LEM database and specially to enable the proper comparison of different methodologies for crop mapping, the LEM challenge was launched.
The dataset has been classified into 19 land cover classes, most of them as crop types. The complete remote sensing data are available, but the classification data (label images) is available for only approximately half of the images. The ground truth of the remaining scenes remains unreleased, to be used for evaluation of the submitted results.
Some institutions (universities, research institutes and private company) from Brazil and abroad (Austria and Germany) have already subscribed for the challenge.
The competitors can choose any of the following tasks characterized by the source data used to infer the crop maps, specifically: Task 1) Sentinel-1 only data; Task 2) Sentinel -1 & Sentinel-2 data; Task 3) Sentinel-1 & LANDSAT data; Task 4) Sentinel-1 & Sentinel-2 & LANDSAT data. The participants of one task will submit to the project organizers a single result per month in form of a label image. The evaluation will be based on the computation of pixel-wise confusion matrices. More information can be found at http://www.lvc.ele.pucrio.br/LEM_benchmark/index.html.

CONCLUSION
The project "Benchmark database for tropical agricultural remote sensing application" funded by the ISPRS Scientific Initiatives 2017, enabled the building of the LEM database. Hopefully, in the future, we will have a set of benchmark databases like this one available, so it will be possible to advance the development and testing of methodologies across different sites and epochs and improve agricultural remote sensing.
The data available on the LEM benchmark database is being used in the development of PhD thesis, master dissertations, and other studies, resulting in six publications so far (Sanches et al., 2018a;Achanccaray, 2019;Chamorro, 2019;Chamorro et al., 2019, Dutra et al., 2019Prudente et al., 2019), with interesting results. And future studies are being planned to collect more data from LEM municipality regarding another Brazilian crop year and other satellites (commercial radar data) to improve the LEM database.
It is worth highlighting that the LEM database is contributing to the advance of crop recognition using SAR data for mapping crops is Brazil, which is a research area currently little explored. Also, the LEM database is the data source for a challenge that is currently going on (until June 2020). After the LEM Challenge is over, the entire LEM database will be open for all the remote sensing community (will be available at http://www.dpi.inpe.br/agricultural-database/lem/), and it is expected that more researches will benefit from the database.