IMCLASS – A USER-TAILORED MACHINE LEARNING IMAGE CLASSIFICATION CHAIN FOR CHANGE DETECTION OR LANDCOVER MAPPING

With the increasing availability of satellite imagery at several spatial, spectral and temporal resolutions, the choice of the best image and the most appropriate method for object detection and classification of a broad range of land surface classes or processes is still a difficult task for many users. In order to guide the users, we proposed a user-tailored machine learning method (IMage CLASSification ImCLASS) to detect and classifiy specific landcover classes. The method assumes a mono-class approach taking several ill-posed problems (e.g. class imbalance, high diversity inside the studied class, similarities with the adjacent samples...) as use cases (landslides, construction works in urban areas, burnt areas, vegetation classes...). It is a generalization of the ALADIM processor already validated in the context of landslide mapping and available as a service on the ESA GeoHazards Exploitation Platform (GEP). The proposed chain is able to combine optical and radar images, uses open source libraries, and is optimized for rapid calculation on HPC environments. The ImCLASS processor is presented and its performance is evaluated on three use cases: landslide detection and mapping after disasters in different regions of the World, urban classes change detection with a focus on construction works in Strasbourg, and crop mapping (vineyard) in the Grand-Est region. First results using either bi-dates or mono-date imagery are presented. * Corresponding author


INTRODUCTION
In many scientific domains, an increased emphasis is currently observed for data mining techniques to extract information from large remote sensing datasets (Navalgund et al., 2007;Lu and Weng, 2007). However, most of the algorithms and processing chains are not yet fully tailored for operational applications and thus not operationally exploited by possible non-academic users. The reasons for these are: • the absence of collaborative development programs between non academic users and researchers where the user is fully embedded in all steps of the processing chains and application development, • the necessity of domain/image adaptation to implement classification methods fully tailored to the application needs, • the creation and access to sample/training data fully consistent with the application needs and also covering large areas, • the absence of dedicated infrastructures (calculation ressources and data repositories; e.g. IaaS/PaaS) where possible "non expert" users may release classification experiments.
In order to meet the needs of a wide range of scientific disciplines, supervised machine learning is a flexible tool for tailoring the processors to different processes, classes, environmental conditions and multiple sensors systems.
However, the use of such techniques is still constrained to the analysis of small areas and often to one temporal slice, and necessitates domain adaptation. Four bottlenecks are currently identified: • the difficulty to relate complex environmental objects with data-structures resulting from multi-sensor and multitemporal image observations, • the difficulty to generate consistent samples and to select the most appropriate as training data for increasing the image classification accuracy, • the absence of relevant methods to document and communicate the quality of the classification to the users, • the scalability of the machine learning techniques to large data volumes.
Since 2018, A2S 'Application Satellite Survey' initiated the development of the ImCLASS change detection and classification processor. ImCLASS targets the supervised analysis of optical and SAR remote sensing images and the use of a machine learning approach including features extraction, feature dimension reduction and feature classification. ImCLASS is designed prioritary for the detection of ill-posed problematic classes (e.g. class imbalance) though any type of classes can be detected based on a sample training set. ImCLASS builds on the previous ALADIM image classification system (developed at EOST and LIVE; Stumpf et al., 2014) available as a service on the ESA GeoHazards Exploitation Platform (GEP). The design of ImCLASS invoved operational The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) users (EuroMetropole Strasbourg/EMS, Grand-Est Region). The objective of this manuscript is first to briefly present the processing chain, and second, to evalute its performance for three uses cases.

ImCLASS PROCESSING CHAIN
The main purpose of the ImCLASS processing chain is to be able to operate in a big data context, for multi-sensor and time series processing, and over large regions of interest. In order to facilitate the porting on calculation clusters, ImCLASS is coded in Python 3.6, embedded in Docker/Singularity environments. and deployed on the HPC hardware of A2S hosted at the Datacenter of University of Strasbourg. The part of the code dedicated to model prediction is fully parallelized. ImCLASS is currently being implemented in an html-based web service.

Input data
ImCLASS allows the processing of optical and SAR remote sensing images. For the optical images, which are the focus of this work, the code offers two processing modes associated to the use of either medium spatial resolution/high spectral resolution sensors (such as Copernicus Sentinel-2, Landsat-8 or mixed-pairs Sentinel-2/Landsat-8) or very high spatial resolution/low spectral resolution sensors (e.g. four bands Red, Green, Blue and Near Infra-Red, eventually associated with a Panchromatic band; such as SPOT6/7 or Pléiades). Preprocessing steps including spatial resolution resampling, cloud detection, and filtering are available. Image fine co-registration is currently being implemented. Several use scenarios are possible in terms of data availability (mono-date image, bi-date images, time series) or application domain. On top of the satellite images, ImCLASS uses samples of the thematic class (in shape file format) as input.

Description of the processing chain
The ImCLASS workflow is presented in Figure 1. First, features extraction is realized with the computation of several attributes derived from the input image(s) : spectral bands, spectral indices and Haralick textural indices (inside a window with a size defined by the user), in the case of bi-date application, change detection features are also calculated. Topographic attributes are also computed using SRTM-30m by default or a more accurate exogenous DEM provided by the user. The topographic indices are computed at three different resolutions (full resolution, a three times lower resolution and a four times lower resolution) In total, 146 features are calculated for the Sentinel-2 and Landsat-8 version for the bi-date mode (77 for the mono-date mode), and 87 for the Spot6/7 and Pléiades version for the bi-date (64 for the mono-date mode). In a version currently in progress, an option offers the possibility to add an exogenous mask provided by the user.
Second, classification of the features is realized using a Machine Learning Random Forest classifier (Breimann, 2001). The pixels belonging to the training sample are split in two training (⅔) and validation (⅓) subsets. The features' vectors of the training pixels are extracted and used to build the Random Forest model (Stumpf and Kerle, 2011). Within the area of interest, the model predicts for each pixel the probability of belonging to the class. Options for result filtering (salt and pepper noise) using morphological operators are available.

Output products
The ImCLASS outputs consist in an inventory map of the studied objects and a probability map expressing the factor of confidence for each classified element. It also allows the user to modify the probability threshold between the two classes depending if more weight is given to the precision or to the recall. Options for an automated computation of the optimal threshold for result binarization (F-score) are available; by default the probability map is binarized with the F1-score threshold, which considers both the precision and the recall with an equal weight. Two graphs are associated with these outputs: a confusion matrix obtained with the default binarization and the curves of the precision and recall values with respect to the different possible thresholds in the range {0,1}.

DOMAIN APPLICATIONS
In order to evaluate the performance of the processor and test its genericity, two use case scenarios are presented: (1) a change detection problem applied to two application domains in geology (landslides) and territorial planning (urban construction works); and (2) an object detection applied to agricultural application (vineyard mapping).

Landslide detection:
Extreme precipitations (typhoons) and earthquakes can trigger thousands of landslides on susceptible terrains. As a consequence of climatic changes and potential global warming, an increase of landslide activity is expected in the future, due to increased rainfalls, changes of hydrological cycles, more extreme weather, and concentrated rain within shorter periods of time (Kirschbaum et al., 2012). Complete event landslide inventory maps are the first source of information for identifying the most susceptible zones, prioritize the protection measures, and quantify the hazard. Visual image interpretation and field surveys are still the prevailing methods for inventory mapping but require several months or even years of manual labor. The increase availability of satellite imagery combined with Machine Learning methods give the opportunity to rapidly and efficiently detect and map landslides in very short time.
Several sites of interest ( Figure 2) were intensively studied allowing the improvement of the processor (input images, spatial and spectral resolution, features). The zone in Myanmar was hit by tropical storm Komen in 2015 and studied with a pair of Landsat-8 / Sentinel-2 images; the zone in Haiti was struck by Hurricane Matthew in October 2016 and studied using a pair The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) of SPOT-6/7 multi-spectral and panchromatic images. A pair of Sentinel-2 images is used to create the landslide inventory map in Mozambique where the Idai cyclone causes a lot of landslides in March 2019. In July 2019, South of Japan was affected by heavy rainfalls and landslides were mapped using a pair of Sentinel-2 images; More recently in November 2019, the West-Pokot province in Kenya was struck by torrential rain, and a landslide map was produced with a pair of Sentinel-2 images and with a Pléiades image (Figure 3).

Urban construction works detection:
Urban managers need information for urban territorial planning and monitoring. Traditional methods are based on the visual interpretation of aerial photographs or field surveys. These tasks are very time-consuming. Urban changes have been studied for several decades with remote sensing images (Herold et al., 2002;Hussain et al., 2013). With the very high spatial resolution images, the user needs consist in detecting and monitoring the state of construction of the buildings in order to update their database. For instance, the EuroMetropole of Strasbourg (EMS) needs to monitor the state of buildings currently upgraded or created (250 to 350 building permits per year). This information is summarized in a database of 'Inventory of Located Building' (ILB -point shapefile) updated by experts twice per year often by ground truth survey. In order to provide information to the urban managers, the image dataset should have a very high spatial resolution, a high temporal resolution (every six months) and should be associated with elevation data to detect the beginning and the end of the urban changes.
ImCLASS processor is then tested to monitor the urban building evolution two times per year with Pléiades imagery (stereo and tri-stereo). A set of three Pléiades images acquired in 2016, 2017 and 2018 were available through the database of Kalideos-CNES. A pan-sharpened pre-processing step has been applied to obtain a finest spatial resolution in a multispectral mode for each date. The reference database has been also first preprocessed to obtain a changes database where each transition between 2016 and 2018 is qualified (e.g. bare soil to current construction work, current construction work to completed construction, bare soil to completed construction). 139 'transition' samples have been identified between 2016 and 2018. Tests have been performed on a subset of Strasbourg covers about 66 km 2 within the EuroMetropole of Strasbourg (Figure 4). The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) The vineyard class is used for evaluating the performance of ImCLASS for this application domain. Difficulty arose from the heterogeneity of the vineyards plots in Alsace on several slope morphologies (from the Vosges foothills to the plain lowlands) and to the small size of these plots.
The area of interest covers 119 communes on two French departments (Bas-Rhin, Haut-Rhin) with an area of 4444 km 2 and a length of 120 km. Three areas of around 100 km 2 have been selected ( Figure 5) for a preliminary spatio-temporal analysis. The spectral band variation (with a particular focus on the bands B02-red and B05-NIR) over two years (by selecting one image per month without clouds during the 2018-2019 period) of Sentinel-2 images on several crop types (from the RPG database) showed that some couples of dates should be interesting to test the ImCLASS chain: (1)

Application domain 1: landslide detection in West-Pokot County, Kenya
For the landslide application domain, only the example of West-Pokot The landslides were detected from a combination of highresolution Sentinel-2 images and very high-resolution Pléiades images with the engagement of the UNOSAT's rapid mapping service which activated the 'International Charter Space and Major Disasters'. ImCLASS is used withcloud-free Sentinel-2 images acquired on November 28, 2019 (post-event) and September 19, 2019 (pre-event). The processing used as input a small reference training dataset of a 184 landslides manually digitized over a region of interest of ca. 400 km 2 . The lands affected by landslide correspond to an area of 5.2 km 2 for a density of ca. 1.3% (Figure 6). As results, the probability map show that lands affected by landslide correspond to an area of 1 km 2 for a density of ca. 3% (Figure 7). This result produces quickly after an event on site with a difficult access is very interesting for decision makers. The landslide inventory thus generated with ImCLASS allows pointing out the largest landslides and most impacted areas, which is fundamental to guarantee the supply of humanitarian assistance. Moreover ImCLASS detects also smaller landslides on the upper slopes, which have to be taken into account in medium term for landslide hazard management.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition)

Application domain 2: urban construction works detection in Strasbourg
For the Strasbourg site, several tests have been performed using Pléiades mono or bi-dates mode. Better results are obtained with the bi-date mode. The tests focus on each identified transition independently. The presented test focus on the transition from bare soil to completed construction between 2016 and 2017. The processing used as input a small reference training dataset of a 16 urban works from the reference database. A postclassification process is then applied on the results in order to compare final results in a vector format. This step allows the analysis of the correctly classified plots, those omitted by the model, and those predicted by the model but not present in the validation database (reference data).
The processing allows detecting about 140 urban works with an individual area upper than 100 m 2 . This corresponds to a global area of 0.1 km 2 for a density of ca. 0.15%. About half of changes identified in the reference database are well detected, however results are very encouraging because some plots of changes are well identified while they have not been catalogued in the reference database ( Figure 8).

Application 3: vineyard mapping in Alsace
Several tests have been applied on the three coupled of dates between 2017 and 2018 with 370 training samples mainly from the RPG database (314) and completed by a manual digitalisation on the three test sites. The results presented here are based on a bi-date analysis of a pair of Sentinel-2 images dating back from May 2018 and September 2018. The vineyard mapping using Sentinel-2 free images is quite unprecedented and challenging because of the apparently insignificant of the textural pattern (the vineyard parcels appear homogenous), The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLIII-B3-2020-677-2020 | © Authors 2020. CC BY 4.0 License. which is traditionally the distinctive feature for this type of crop. However, the resulting classification map showed clearly that our algorithm allows completing advantageously the initial RPG database (Figure 9). Only a few omissions are observed probably associated with young vineyard parcels or turf plantation. A few commissions are also observed, vineyard are mainly confused with permanent grassland or parcels without production. Within the region of interest an area of 190 km 2 is identified as vineyard crops for a density of 4.3%.
These first results of vineyard mapping in Alsace are very promising and some ways of improvement are currently investigate in particular a processing in time series mode in order to take advantage of the high temporal resolution of the Sentinel-2 images. The use of more accurate images is also a way to investigate, which is now made possible by the annual coverage of the French territory with SPOT (1.5 m and 6 m for the panchromatic and the multi-spectral images respectively). Figure 9. Comparison between the vineyard mapping on the subset 2 between the RPG database (above) and the Imclass results (below).

CONCLUSION
ImCLASS has been tested for three application domains in colaboration with end-users. These different studies point out the difficulty to achieve an optimal generic algorithm. Indeed, each studied case present a specificity in terms of : • avaiability of ground truth and / or a reliable initial database, • requierement of pre-processing (image fusion, temporal indice analysis …) and / or post-processing (conversion to vector format, morphological operations, stack of a series of results, …) • choice of the input images (spatial resolution, temporal resolution, bi-date or mono-date mode, …) These specificities of each use case were the opportunity to improve ImCLASS chain and to aggregate and test new functions. The version of ImCLASS exploited here shows promising results with a direct, relevant interest for end-users. The version currently in progress will allow to integrate image time series in the processing chain and to design an onlineclassification service allowing a user-friendly graphical interface for the users to easily label objects of interests and visualize on-line the results of classification.