ROTATION FORESTS AND RANDOM FOREST CLASSIFIERS FOR MONITORING OF VEGETATION IN PAYS DE BREST (FRANCE)

Remote sensing is a potentially very useful source of information for spatial monitoring of natural or cultivated vegetation. The latest advances, in particular the arrival of new image acquisition programs, are changing the temporal approach to monitoring vegetation. The latest European satellites launched, delivering an image every 5 days for each point on the globe, allow the end of a growing season to be monitored. The main objective of this work is to identify and map the vegetation in the Pays de Brest area by using a multi sensors stacking of Sentinel-1 and Sentinel-2 satellites dat a via Random Forest, Rotation forests (RoF) and Canonical Correlation Forests (CCFs). RoF and CCF create diverse base learners using data transformation and subset features. Twenty four radar images and optical dataa representing different dates in 2017 were processed in time series stacks . The results of RoF and CCF were compared with the ones of RF.


INTRODUCTION
Environmental vegetation monitoring is constantly increasing with a desire to preserve natural habitats and ecosystems. However, in the economic context in France, urbanization and changes in land use and land cover (LULC) favor the degradation of these environments. Regardless of the national, regional or municipal scale of action, ecological study requires spatially identifying the environments making up the territory in order to be able to propose action plans to enhance or conserve these spaces.
Remote sensing is a potentially very useful source of information for spatial monitoring of natural or cultivated vegetation. Many approaches have been developed recently for monitoring and mapping of vegetation (Roberts et al., 2015;Niculescu et al., 2016 ;Niculescu et al., 2017 ;Shivers et al., 2018;Muller-Karger et al., 2018;Rapinel et al., 2019). We noted the new opportunities for the monitoring of the vegetation with the arrival the European multi-sensor images in time series and namely with the increasing number of satellites and the availability of free data. In the European Copernicus program, the constellations Sentinel-1 and Sentinel-2 currently provide complete coverage of the national territory every 5 days, at ten meter spatial resolution in several parts of the electromagnetic spectrum. It is possible to follow the intra-and inter-annual evolution of ecosystems at a fine spatial resolution at the scale of the territory. This massive flow of earth observation data provides a rich and detailed description of ecosystems and cultures, allowing control over their state and evolution. The high optical data in time series have marked the analysis of environmental phenomena, definition of trends and characterization of change events. The quality of the optical images is dependent on the climatic conditions and especially on the cloud cover on the day of the shooting. This explains why the series of optical images can be incomplete and irregular, which can hamper the recognition of cultures when the gaps are located at key dates vis-à-vis the phenology of cultures. Conversely, radar data is not dependent on atmospheric conditions and can be acquired day and night. Radars are used for various purposes, mainly for surface measurements or even environmental monitoring. It is possible to analyze of vegetation thanks to its sensitivity to the roughness of the surface layer of the cover and to humidity. Detection of the type of crop is also possible depending on the parameters of the wave emitted. In addition to polarization, very important information in the field of radar imagery is the angle of incidence with which the images are acquired, because this has a direct influence on the discrimination of cultures. Bargiel (2010, 2011, 2014), Ferazzoli (2002 and Bagdhdadi (2008Bagdhdadi ( , 2009) have shown that cultures stand out better for strong angles of incidence (around 40 °). The use of intra-annual time series of satellite images acquired by sensors like Sentinel-2 and Sentinel-1 with a high revisit capacity and a high spatial resolution makes it possible to acquire a larger number of images on a same area and therefore improve the identification and characterization of different classes of vegetation and cultures.
The joint use of radar and optical satellite data has already shown a certain interest in identifying the natural, semi-natural vegetation, the crops and agricultural practices on a fine scale at a given date. Many very recent studies include the fusion of optical and SAR for the characterization of vegetation (Hosseini et al., 2019;Niculescu et al., 2018;Orynbaikyzy et al., 2019 ;Dymond et al., 2019 ;Mendes et al., 2019 ;Stendardi et al., 2019). Several researchers study the fusion like a concept for combining data from different sensors (Lu et al., 2007;Joshi et al., 2016), with the aim of generating information of "greater quality" than the individual input datasets (Vivone et al., 2015).
There are three categories of methods of image fusion related to the level where the integration is performed: data fusion (pixellevel); feature fusion; and decision fusion. In this study we used the pixel-level method (data fusion). Pixel-level image fusion is widely used in remote sensing (Li et al., 2017) and the fusion methods are applied often in remote sensing. The fusion methods are easy to implement and fast to compare with other transform methods based on the pixel-level. The first consequence of this method is its influence on the accurate estimation of the optimal weights for different pixels. In other cases, the fusion performance should be quite limited. Among the most used machine learning for classification high-of dimensional data algorithms is Random forest (Cutler et al., 2007 ;Breiman, 2001 ;Belgiu et al., 2016). Random Forest creates what are called decision trees. A decision tree under RF is developed from a sample chosen at random from a training set. The potential of Random Forest is in the creation of a large number of trees (the forest), which statistically increases the possibility of obtaining an optimal tree where the separators at each node will be adapted to the chosen classes. The main advantages of RF are the lower computational complexity and the lower correlations between the trees (Gislason et al., 2006 ;Chan et al., 2008 ;Liu et al., 2018). In order to further improve the performance of RF, Rotation forests (RoF) and Canonical Correlation Forests (CCFs) are proposed to use in this study. Principal component analysis (PCA) and canonical correlation analysis (CCA) are respectively used in RoF and CCF in order to generate the rotation feature space for the training samples for the obtention of the certain diversity. These methods were used with hyperspectral data with the very good results by Xia et al., 2015 andXia et al., 2017. In this paper, we applied RoF and CCF to classify times series images Sentinel-1 and Sentinel-2 and found that its performances are better than bagging, AdaBoost, random subspace, and random forest.
The objective of the study is the detection and mapping of the vegetation at the Pays de Brest using Sentinel-1 and Sentinel-2 satellite images using and RoF and CCF. This objective implies an identification of the natural, semi-natural vegetation and of the levels of the plant formation class and identification of the artificial vegetation (summer crops and winter crops).

STUDY SITE
The country of Brest is an association of 7 intercommunalities created in 2012, located northwest of Finistère (Brittany), for a total of 103 municipalities. This territory groups 43.5% of the Finistère population over a quarter of the area of the department. (Figure 1).

Figure 1. Localization of the Pays of Brest
Source : geo.pays-de-brest.fr The Pays de Brest is organized around the Brest urban center, which centralizes most of the country's activities and jobs. The country is strongly impacted by the sea, with its 599 km of coastline for its 2102 km² of area. It includes many remarkable natural environments with great biodiversity, including in the Armorique Regional Natural Park, which is partly integrated into the territory. The country is also characterized by its agricultural and agrifood dimension, in fact, agriculture represents 54% of the surface of the territory. The territory has a rather heterogeneous landscape. Three groups form the Pays de Brest: i) BMO (Brest Métropole Ocean), where artificial areas are very important, agricultural land occupies only half of this area. Peri-urbanization is important in the municipalities around Brest. ii) The Leon plateau to the north where agricultural land dominates. The coastlines are densely populated in this area. Even if certain portions are protected, the proportion of natural and semi-natural areas is rather low; these environments are confined to the bottom of the valleys least suitable for agriculture and in the process of landlocked. iii) Southern Elorn where the proportion of agricultural land is lower. The proportion of natural and semi-natural areas is higher than in the rest of the country, thanks to the presence of wooded areas and, and in the Crozon peninsula, large areas of coastal moor.

DATA SET
The Sentinel-1 A and B (Synthetic Aperture Radar) satellites were launched into orbit on April 3, 2014 and April 25, 2016 respectively. The European Space Agency (ESA) with Copernic program in continuity with the ERS and ENVISAT satellites. the have developed the constellation of the two satellites. The objective of the program is to monitor the environment. S1 data is recording in C bands (5.6 cm). A temporal resolution of 12 days as well as the same geographic area to be observed every 6 days so the orbit of the two satellites is quasi-polar. Sentinel-1 images are available in SLC (Single Look Complex) or GRD (Ground Range Detected) formats and in different acquisition modes. For this study the mode chosen is: Interferometric Wide swath (IW) which is a mode where the images are acquired according to a swath of 250 km and with a spatial resolution of 5 m X 20 m. The polarization of images available are VV parallel and VH cross polarization in GRD format. All the satellite data used in this study were recorded during the year 2017 on 4 orbits (table 1). . With a 10-day orbital cycle, and a sunsynchronous polar orbit, data are acquired in 13 spectral bands, with a spatial resolution ranging from 10 to 60 meters. Sentinel-2 is particularly useful for mapping vegetation, thanks to the presence of two new spectral bands between 705 and 740 nm (B5 and B6). We have 12 optical images in 2017, with a cloud cover rate sufficiently low to distinguish our study area. We have selected 12 optical images for our study presented in the following table (Table 2).

METHODOLOGY
In this study, we use two machine learning approaches, RoF and CCF, applied to Sentinel-1 and Sentinel-2 time series. Three different models were built for each of these algorithms. A set of models combined the S-1 and S-2 features. The classification results were compared. The methodology implemented aims to improvethe accuracy of identification and mapping a few vegetation classes in Pays de Brest. We applied this methodology a total of 24 data set with of combined radar and optical data and we tested several different subsets of the data.
RoF constructs various training sets with the following steps. The first step concerns the features which are split to the subsets without any intersection. Then, a rotation matrix is produced by using a data transformation on each subset with 75% bootstrapped samples. In the last step, we construct the classifier by using the new features projected by the rotation matrix. The final output is generated by integrating the classifiers' results by repeating the above steps several times.
For the CCFs, CCA is performed on the features and labels of the bootstrapped training set to find the projections. Then, the projections are used to rotate the features to produce very diverse classification results achieved by the decision tree, which is beneficial for the ensemble. The general principle of RF is to create a set of decision trees from randomly selected subsets of training data. In this work, the number of trees is set to be 40, and the number of features in subset is set to the square root of number of features used in this study.
The evaluation of these models was based on overall accuracy (OA), class-specific accuracies and Kappa coefficient. The validation of the classification was carried out by calculating a contingency matrix and a Kappa index. The contingency matrix is used to assess errors produced by omissions as well as commission errors. Omission errors correspond to pixels that have not been assigned to the class to which they belong. These errors indicate the underestimates. Commission errors correspond to pixels that have been assigned to a class to which they do not belong. These errors denote overestimates. The contingency matrix is used to calculate an overall classification assessment index, the Kappa index (Congalton 1991).

RESULTS
Eight classes of main plant formations have been labeled: summer crops, winter crops, forest/ undergrowth, water, grassland, moors/lawns, no vegetation and sand dunes. These classes were selected based on the multiples observation in situ  In general, agricultural land covers more than half of the surface (61.5%), and forests and natural and semi-natural environments   The results show very good classification performance for the two algorithms (3 applications): over accuracy -high precision over 90% (table 4 and table 5). With the application of the S1 and S2 features, the CCFs classifiers producing the highest accuracies for all of the classes. The forest classification in the three algorithms was excellent (more than 90% of pixels well classified). The grassland had a good classification accuracy for the three versions (more 80% precision) with a better result for the classifications CCFs (85.21%). Similar results present the classification of sand dunes, more than 80% of accuracy always with better results for CCFs classification (90.43%). Artificial vegetation (summer crops and winter crops) are less well classified. The summer crops were better identified and classified with percentages between 80% and 85% with the same trend as for the other classes (better results for CCFs). The winter crops were the last identified and classified with a maximum of 61.30% always for CCFs classifications. Crops are mostly confused with non-vegetation class, grassland and moors. However, we can see an increase in error rates for the moors / lawns classes. The spectral and textural similarity between certain mesophilic grassland and cultures generated classification errors. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) that the S1 and S2 are performing in the discrimination of the major classes of vegetation.

CONCLUSION
Different feature extraction methods were tested for the operation of Rotation Forests. Rotation Forests in this study is proposed with spatial contextual information. This methodology can substantially improve the results. The classification accuracies obtained in this study (more 90% for the best classification) show that the staking Sentinel-1 and Sentinel-2 data was able to produce reasonably accurate classifications of medium-scale vegetation namely used the algorithm of Rotation Forests. This algorithms was able to identify and to monitoring the the main natural and semi-natural plant formations and also the artificial vegetation (summer crops and winter crops) of the Pays of Brest. This studie shows the potential of multi-temporal imagery be scaled to regional level and the results can be expend in the another coastal zones vegetation.
The accuracy of the classifications of the vegetation is also improved by multiplying the number of images. The interest of S2 satellite images, produced regularly for the identification of vegetation classes, is based on the possibility of having several dates during the year and of being able to study the evolution of the environment. The most suitable images to identify the plant formations of Pays of Brest are those that were acquired at the start of the year, namely at the end of winter and in spring. We can also note that the images acquired in the spring are, in all cases, among the best combinations of images selected.
The contribution of the dual polarization of the on-board sensor on Sentinel-1 coupled with the high temporal repeatability of Sentinel-2 optical images offers interesting perspectives for monitoring coastal zones and identifying the vegetation present. Using multiple images acquired on different dates was improved the discrimination and characterization of cultures. As at certain times of the year, it is very difficult to discriminate cultures from one another due to their very close phenology, which gives them similar spectral characteristics, it is most often necessary to use several images acquired in the same year. For agricultural environments, the C band of the Sentinel 1 satellite has shown an ability to assess the dynamism of rapeseed, corn, sunflower and soybean crops. The methodology Time Series stacking improves the results for the identification and mapping of differents class naturel, semi-naturel and cultures in Pays de Brest. The final classification proved to be very precise. Such a map could therefore serve as a basic document for monitoring and managing the natural vegetation, semi natural vegetation and crops.
These algorithms used in this study were very stable. The RF, RoFs and CCFs algorithms are robust and not very sensitive to fine-tuning of most parameters. Only the parameters number of features in a subset, have an impact on the quality of the classification.