EXPLORING SENTINEL-2 FOR LAND COVER AND CROP MAPPING IN PORTUGAL

Land cover information is fundamental for a wide range of fields, such as research and policymaking. Remote sensing has historically been a source of data on land cover and recognized as the only practical systematic and wall-to-wall source for crop mapping. The European Copernicus programme and its free data policy for Sentinel-2 made accessible large volumes of imagery for frequent mapping and updating, generating new challenges. One such challenge is timely mapping through supervised image classification. The need for a prompt classification workflow requires training to become automatic, which typically relies on samples collected manually via fieldwork or image interpretation. Another challenge is to map land cover classes that traditionally have been troublesome to identify when satellite observations were sparse. For instance, crops have a spectral response that changes substantially throughout the year or during narrow time windows, which cannot be observed with few image acquisitions. This paper presents research ongoing in Portugal to develop a methodology for automatic image classification using training samples labelling with no human intervention. Rather, auxiliary datasets are used to randomly extract labelled points from large training samples to produce a land cover and crop map in raster format at 10 m spatial resolution using 2018 Sentinel-2 images. The proposed methodology was tested with the Random Forest classifier achieving an overall accuracy of 76%. These results are promising and support the idea of refining the methodology to move towards an annual land cover map production at the national scale. * Corresponding author


INTRODUCTION
Land cover analysis have evolved from studying a small geographic region at a determined period to global studies using smaller spatial resolution and higher temporal periods (Stehman, Foody, 2019). The new paradigm in land cover production -Land Cover 2.0-takes advantage on the developments in computer hardware; increased spatial, spectral, and temporal resolutions of satellite imagery; open-access data and automated data processing using classification algorithms to generate timely, reproducible and accurate land cover maps . Currently, it is possible to classify large geographic areas over multiple decades at an annual time step, as reported by Hermosilla et al. (2018) that generated a 29-year data cube of land cover for the years 1984 to 2012. Moreover, automated systems as the Sen2-Agri can ingest and process multi-sensor imagery (Sentinel-2 and Landsat 8 time series) for operational agriculture monitoring systems (Defourny et al., 2019). Countries such as France and Germany are aiming to produce automatically robust large-scale land cover mapping using supervised classification, time series of high-resolution optical imagery, and existing databases for data training (Inglada et al., 2017;Griffiths et al., 2019). Yet, supervised land cover approaches remain an intricate process as the critical component is the availability of training data (ground truth or reference data) for the signature generation . Additionally, training data collection is delicate in large jurisdictions and over remote areas (Inglada et al., 2017).
The European Copernicus programme and its free and open data policy for Sentinel-2 made available substantial volumes of data for periodic updates of land cover products over extensive territories. These time series of Sentinel-2 collected over relatively narrow time intervals allows monitoring subtle variations in crop phenology as well as abrupt changes in land cover (Gómez et al., 2016). This increased frequency on data availability enables the mapping of crops, which have a changing spectral response throughout the year; making them very difficult to identify when time series of satellite data are sparse (i.e., Landsat). Nevertheless, to fully exploit the openaccess of time-series, automated classification methods are required to generate, for example, annual land cover maps . The automatization entails no human intervention in the labelling of training samples, which typically are collected manually via fieldwork or image interpretation. This paper presents an experimental automatic workflow to classify land cover and agriculture crop types at 10m resolution in a test area of Portugal using existing databases, intra-annual time series of Sentinel-2, and supervised classification with Random Forest. The focus is to (i) test if a pre-defined set of pre-processing rules could remove possible sources of mislabelling from existing land cover and crop types datasets, i.e. allowing us to extract samples for training and testing automatically. Complementary, (ii) the evaluation of the accuracy metrics reached by each class provides insight into the main confusions of the classifier. This information is a criterion to assess the robustness of the workflow and its viability to The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B3-2020, 2020XXIV ISPRS Congress (2020 produce a land cover and crop map at a national scale. The structure of the paper first provides the experimental methodology for the automatic classification workflow. Then, the results of the classification and main findings are presented and analysed.

METHODS AND EXPERIMENTAL SETUP
The proposed methodology is based on an automatic sample extraction from reference datasets of land cover. Afterwards, the collected samples are used to retrieve the spectral information of the land cover classes of interest from the intra-annual timeseries of Sentinel-2. This dataset is used in a Random Forest classification algorithm, splitting the data into testing and training datasets for independent training and validation ( Figure  1). Land cover information is retrieved from several Portuguese reference datasets. The land cover classes like urban, forest and water are derived from the official Land Cover Land Use (LCLU) map of Continental Portugal (COS); agriculture which comprises annual and permanent crop as well as agricultural pastures are obtained from the Land Parcel Identification System (LPIS) of the Instituto de Financiamento da Agricultura e Pescas (IFAP); finally, burnt areas are identified from the maps of the Instituto para a Conservação da Natureza e das Florestas (ICNF). This information is processed to filter out potentially mislabeled pixels. Mislabeling of pixels occurs for several reasons such as the large Minimum Mapping Unit (MMU) of the reference data sets. For example, land cover patches smaller than 1 hectare are not represented in COS due to mapping generalization. This filtering processing comprises spatial inner buffer operation to avoid errors related to mapping uncertainty along the borders between classes. Then, all the areas that burnt between 2015 and 2018 were removed to ensure that vegetated classes are trained with unburnt pixels. Likewise, additional filtering based on the Normalized Difference Vegetation Index (NDVI) calculated from Landsat 8 imagery (2015-2018) is applied to ensure that vegetated classes are associated with acceptable levels of NDVI, which avoids, for example, sampling cases of forest in recent clear-cuts. Furthermore, the Copernicus High-Resolution Layers (HRL), particularly Dominant Leaf Type (DLT) and Tree Cover Density (TCD), are used to distinguish between forest types and forest density, as well as to eliminate non-forest pixels. Crosscomparing different datasets such as COS and HLR increases the reliability of the automatic sample extraction.
The automatic extraction of samples from the pre-processed datasets is performed twice to get two independent sample datasets, one for training and one for testing. For each sample, the spectral signatures are retrieved at the pixel level from all the spectral information previously collected. Then, the Random Forest model is fitted to the training dataset and the classification performance is quantified based on the predicted labels in the testing dataset. Afterwards, the model is applied to unlabelled data allowing to generate the final map for the tested region.

Study area
The test area used for this experiment is shown in Figure 2. This area has about 1,223,890 ha in the south of Portugal, covers most of the valleys of the Tejo and Sado rivers and contains a great diversity of land uses as well as multiple crop types.

Reference datasets
A total of 31 LCLU classes were used for this classification experiment. The most detailed level of the nomenclature was used, as it can be observed in Table 1. This nomenclature is derived from the COS nomenclature levels. The 31 classes were extracted from three reference datasets: IFAP 2018, COS 2018 and ICNF 2018. The IFAP is the Portuguese LPIS comprised of two independent datasets: the "national parcel registry" that was used for training and the "controlled parcels" used for testing. Several studies, including the Sen2-Agri system, have reported the use of the LPIS to extract samples from agricultural parcels for countrywide crop mapping studies (Costa et al., 2018;Griffiths et al., 2019;Defourny et al., 2019). Secondly, a reclassification of COS was carried out to derive a more generalized number of land cover classes for this study. Lastly, the ICNF is responsible for the execution of an annual map of burnt areas for Portugal based on visual interpretation of Landsat TM/ETM.

Filtering of the data
A spatial buffer was performed in the IFAP reference dataset classes to avoid selecting pixels for which the spectral signature does not match the class label. No other crossing with ancillary data was applied, and training (national parcel registry), as well as testing (controlled parcels), were kept independent ( Figure  3).  The buffer strategy was also applied to COS 2018 classes. However, it is critical to emphasize that the MMU of COS is 1 ha and classification at the pixel level for Sentinel-2 contemplates a 100 m 2 MMU. Therefore, some pre-processing steps are required to prevent the selection of pixels with spectral information that mismatch the class label inherited from COS, which has a larger MMU. The first filter was to intersect all COS polygons with the ICNF burnt mask from recent years, hence removing areas burnt by wildfires (Figure 4-left). Afterwards, forest polygons were crossed with the result of a change detection analysis based on NDVI differencing (Costa et al., 2020) to remove the areas where trees have been uniformly cut down (i.e., clear cuts). For example in the case of forest plantations like eucalyptus (i.e. Figure 4-

Sentinel-2 monthly composites
A crucial input in this research is the intra-annual time series of Sentinel-2 satellite from the European Spatial Agency (ESA). The 5-days Sentinel-2 revisit time allows the collection of highquality spatial and temporal data (Defourny et al., 2019). Sentinel-2 level-2 (L2A) products were downloaded from the Theia Land Data Centre (THEIA), acquired between October 2017 and September 2018, corresponding to the agricultural year of 2018. The imagery was pre-processed with the cloud/shadow mask to avoid clouds being mistaken with bright landscapes and cloud shadows with water pixels, burnt areas or topographic shadows (Baetens et al., 2019). After the imagery pre-processing, five spectral indices are calculated including the Normalized Vegetation Index (NDVI) (Tucker, 1979), Normalized Difference Build up Index (NDBI) (Zha et al., 2003), Normalized Difference Water Index (NDMI) (McFeeters, 2013), Normalized Burn Ratio (NBR) (Key, Benson, 2006), and the Normalized Burn Ratio 2 (NBR2) (Hislop et al., 2018).
The imagery was used to create a multi-spectral composite for every month, including spectral indices. Composites can provide data free of missing values in opposition to single date acquisitions which are often affected by cloud coverage. All the monthly composites and the indices were linearly interpolated in time to create a pixel-level consistent reflectance composite that can capture field level phenologies (Griffiths et al., 2019), as seen in Figure 5.

Automatic sample extraction, classification and accuracy assessment
After the reference datasets were filtered, the vector datasets were rasterized to 10 m cell size and the centroids were converted to points. This step permitted the creation of a point grid for random selection of samples, as seen in Figure 6. If the number of samples available per class was higher than 5000, the dataset was divided into 4000 training and 1000 testing pixels. However, four classes did not meet this requirement, and for these, the available pixels were divided into the proportion of 75% for training and 25% for testing. The information that was provided to the classifier corresponds to the surface reflectance values of the Sentinel-2 composites and the spectral indices derived. A total of 180 features were extracted, equivalent to 10 bands and 5 spectral indices for the 12 months of the 2018 agricultural year. A Random Forest model composed of 500 trees without pruning was used.

Figure 6 Automatic sample extraction from a COS polygon
The quality of the map was assessed quantitatively by summarizing the classification performance using the Overall Accuracy, the User's Accuracy (UA), Producer's Accuracy (PA), F1-Score (F1) as displayed in Table 3.  Table 3. User's Accuracy (UA), Producer's Accuracy (PA), and F1-Score of the map produced

Accuracy assessment of the Land Cover and Crop Type Classification
The map produced is presented in Figure 7. The overall accuracy of the land cover and crop map using monthly composites features and derived indices is 76% for the 31 classes. Among the temporary crop types, the best-performing classes are maize, rice, and tomato with UA and PA values above 90%. In general, the irrigated crops (summer crops) are more stable in their classifications, whereas the rainfed crops (winter crops) have more confusion within themselves. In terms of permanent crops, vineyards achieved UA and PA higher than 75%; similar results are reported in Schmedtmann and Campagnolo (2015), who achieved 85% in parcels classified as maize, rice, wheat or vineyard in the same study area. On the other hand, orchards and olive trees have the lowest PA (35% and 46% respectively), being incorrectly classified primarily as agricultural and natural grasslands as well as other crop types. Permanent crops are usually planted in 2m separation, meaning that the surface reflectance values captured in the Sentinel-2 pixels correspond to a mixture of the crop and soil. In general, crop mapping in the study area can benefit from the use of time series of Sentinel-2 as their average size of the parcels is between 2 and 3 ha. However, for very fragmented landscapes where the parcels are comparatively smaller, it would require higher resolution imagery to meet the same accuracy (Fritz et al., 2019). For the non-agriculture land cover classes, the highest UA and PA (> 90% for both) are observed for water, bare rock, holm oak, and wetlands, whereas the lowest PA (54%) is observed for other coniferous forests, which was more associated with stone pine. In general, there is confusion between the forest classes. For example, the cork oak has a low UA of 65% meaning that the commission error is high, and this class could be mistaken with broadleaf or coniferous forest. The direct mapping of shrublands is challenging as its spectral signal is composed of green vegetation and non-photosynthetic vegetation as well as varying fractions of soil, grass, and shadow (Suess et al., 2018).
In the classification, shrubland was often mixed with build-up, grasslands, bare soil, and sparse vegetation. Yet, there were fewer confusions with the forest classes as the HRL filter allowed the removal of broadleaf and coniferous pixels from shrubland samples, which improved the quality of the automatic training.
The Random Forest classifier required few hyperparameters to tune as opposed to other classifiers and proved to be computationally efficient as it was possible to parallelize it (multi-core processing) to classify the whole area. Also, it The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B3-2020, 2020XXIV ISPRS Congress (2020 allowed extracting the most important features during the classification. As expected, the most relevant features from the time series correspond to the spring and summer months and the bands on the Red Edge (B5, B6, B7), NIR (B8a) and SWIR (B11 and B12). The inclusion of spectral indices slightly improved the accuracy but was not a predominant variable.

The relevance of time series in crop type classification
Smoothed spectro-temporal profiles were computed from the cloud-free monthly composites using the averaged reflectance values in the testing dataset. Figure 8 presents the averaged spectro-temporal profiles for a) wheat that is a temporary rainfed crop grown during autumn/winter and b) rice mainly grown during the spring/summer (temporarily irrigated); the illustrations in false colour are for a specific parcel for visualization purposes. It is possible to envisage the growing window for wheat from January to May, characterized by the noticeable increasing values in reflectance on the NIR. On average, in the information extracted from the testing set, there is regrowth after the harvest in June. This is not the case for the specific parcel used as visualization example, as farmers can decide to grow multiple crops during the year, implement leguminous plants for soil recovery or leave the plot as fallow land. This variation on the plot usage for the rainfed crops entails several confusions for the classifier. In rice fields, the SWIR band captures the flooding period (March) and the NIR the flowering period (July to September); possibly, these peaks allow for characterizing the crop accurately. The multi-temporal information permitted to capture the phenological variation of the crops that cannot be distinguished from single-date acquisitions, justifying the relevance of intra-annual time series in crop type classification for a specific year.

CONCLUSIONS
Up-to-date land cover and crop type information play an essential role in commercial and environmental monitoring and planning. For its updating, they have benefited from remote sensing imagery at a national, continental and global level. However, many challenges remain to produce accurate and timely land cover and crop type maps. This experimental automated land cover exercise on a regional test area of Portugal focused on the use of intra-annual composites of Sentinel-2, supervised classification with random forest, and automatic sample extraction based on a pre-processing set of rules. The overall accuracy of 76% was achieved for 31 land cover and crop type classes. Some classes achieved high accuracies, such as for example water, wetlands, some forest classes and some crop types. The filtering rules allowed these forest classes (i.e., holm oak and eucalyptus) to achieve accuracies higher than 80% (UA and PA) by identifying areas with more than 60% of tree cover and by removing clear cuts. It also helped to reduce confusion between shrublands and forest classes. Agriculture classes such as irrigated crops benefited from the monthly time-series for accurate predictions. However, some confusion is verified for other forest classes, rainfed crops and grassland classes. Nevertheless, this experiment considers a larger than usual number of LCLU classes, which represent an additional challenge for supervised image classification. These are good indicators for the implementation of an operational chain process using automatic satellite multi-temporal classification based on Sentinel-2 to have a national land cover map every year. The idea is to expand and refine the methodology applied in this study area to move towards an annual LCLU production at a national scale.