Specific alpine environment land cover classification methodology: Google Earth Engine processing for Sentinel-2 data

Land Cover (LC) plays a key role in many disciplines and its classification from optical imagery is one of the prevalent applications of remote sensing. Besides years of researches and innovation on LC, the classification of some areas of the World is still challenging due to environmental and climatic constraints, such as the one of the mountainous chains. In this contribution, we propose a specific methodology for the classification of the Land Cover in mountainous areas using Sentinel 2, 1C-level imagery. The classification considers some specific high-altitude mountainous classes: clustered bare soils that are particularly prone to erosion, glaciers, and solid-rocky areas. It consists of a pixel-based multi-epochs classification using random forest algorithm performed in Google Earth Engine (GEE). The study area is located in the western Alps between Italy and France and the analyzed dataset refers to 2017-2019 imagery captured in the summertime only. The dataset was pre-processed, enriched of derivative features (radiometric, histogrambased and textural). A workflow for the reduction of the computational effort for the classification, which includes correlation and importance analysis of input features, was developed. Each image of the dataset was separately classified using random forest classification algorithm and then aggregated each other by the most frequent pixel value. The results show the high impact of textural features in the separation of the mountainous-specific classes the overall accuracy of the final classification achieves 0.945.


INTRODUCTION
Land Cover (LC) plays a key role in many disciplines, such as sustainable land management, land resource monitoring, urban vegetation mapping landscape ecology and climate-related researches (Feng et al., 2015;Rizeei et al., 2016;Sekertekin et al., 2017;Shelestov et al., 2017;Turner and Gardner, 2015). This makes the land cover, and land use, maps highly demanded (Carrasco et al., 2019;Delalay et al., 2019;Sekertekin et al., 2017;Thanh Noi and Kappas, 2018). LC maps are commonly derived from machine learning classifications of optical imagery, which represents one of the prevalent applications of remote sensing and it has been described by many authors (Carrasco et al., 2019;Delalay et al., 2019;Rizeei et al., 2016;Sekertekin et al., 2017;Shelestov et al., 2017;Sidhu et al., 2018;Thanh Noi and Kappas, 2018). During the last years, LC classification has done great steps forward: a wide range of free satellite mediumhigh optical imagery, specific classification algorithms, many processing platforms and machines with more and more high computational power are now available (Carrasco et al., 2019;Rizeei et al., 2016;Sidhu et al., 2018). Besides today's achievements, we still face some major constraints in LC classification, which can be distinguished in environmental constraints and technical constraints. Among the technical constraints, the poor temporal resolution of satellites is one of the most popular, but it has been partially overcome with the introduction of medium-high resolution satellites that increase the free data available and make possible integrating the datasets from different acquisitions. Although, data integration and highspatial resolution requires to manage a large amount of data, but also a large amount of storage, as well as significant computing power and time (Carrasco et al., 2019). Managing satellite dataset requires considerable data storage capability and the high spatial resolution further increase this requirement. Indeed, the antinomy between spatial resolution and computational power is another very common technical constraint. During the last few years, some geographic cloud computing platforms that allow the analysis and storage of geographic data were born. These services (such as Google Earth Engine) decrease the computational and storage limits related to the satellite data processing (Kumar and Mutanga, 2018). The atmospheric disturbance and the high seasonal variability are some of the main environmental constraints in LC classification along with the topography variation. The topography strongly influences the spectral values in satellite imagery, especially in the case of steep areas. Indeed, the strong variability in the reflectance and the spread shadows are the main effect of the topography on satellite imagery, which may complicate the classification (Dorren et al., 2003). This is particularly true in narrow valleys of mountainous areas, where some mountainside are permanently shadowed in winter months. The LC classification of mountainous areas is affected by other major environmental constraints due to climatic conditions. For example, the snow cover prevents the classification in the winter season and the orographic rains and clouds, which mechanisms are influenced by the terrain (Houze, 2012), lower the possibility of accurate classifications. The combination of these factors makes the generation of the land cover maps of mountainous areas particularly challenging and generally recognized as fairlylow accurate (Dorren et al., 2003;Itten and Meyer, 1993). In this contribution, we propose a specific methodology for the classification of the Land Cover in mountainous areas using Sentinel 2, 1C-level imagery. The classification considers some specific high-altitude mountainous classes: clustered bare soil areas that are particularly prone to erosion; glaciers; and solidrocky areas. The methodology tries to overcome the abovementioned environmental limitations and it consists of a pixelbased multi-epochs classification using random forest algorithm. We chose to perform the analysis using Google Earth Engine (GEE) because of its high computational speed and the large dataset of satellite imagery it makes easily available.

Test study
The study interests the Wester Alpine arch between France and Italy and it extends for approximately 340000 hectares ( Figure  1). The town of Cesana Torinese (44° 57′ 0″ N, 6° 48′ 0″ E) in Italy is the reference site of this study. Two Sentinel-2 tiles cover the study area: 31TGK (K) and 32TLQ (Q). Figure 1. Study area. The Blue square covers the K tile, while the red one covers the Q tile. The yellow pointer indicates Cesana Torinese (TO).

Google Earth Engine
The main steps of a semi-authomatic classification workflow (topographic correction, extraction of training points, training of the classifier, classification) were realized on Google Earth Engine (GEE). GEE is a web-based platform for geospatial analysis lunched in 2010, and it is free for research and education purposes (Gorelick et al., 2017;Kumar and Mutanga, 2018). The GEE data catalog is composed of continuously updated geospatial datasets provided by different national and international programs, such as NASA and ESA (Hu et al., 2018;Kumar and Mutanga, 2018;Shelestov et al., 2017;Sidhu et al., 2018). The datasets included some already elaborated satellite data. The true innovation of GEE is the possibility of interacting with very large datasets and computing basic geospatial analysis directly on GEE servers through JavaScript/Python-based API (GEE API) (Goldblatt et al., 2017;Google Earth Engine, 2020). GEE makes available the imagery captured by Sentinel 2. Sentinel 2 is a mission of the European Space Agency (ESA) and it is composed of twins satellites (Sentinel-2A and Sentinel-2B) that carry multispectral optical sensors. Sentinel 2 products are images of 13 spectral bands that are published with two possible levels of processing (ESA, 2020). The dataset for the classification is composed of 31 scenes (epochs) filtered with GEE from Sentinel 2 level-1C representing the Top-Of-Atmosphere (TOA) reflectance scaled by 10000 (it includes radiometric and geometric corrections including orthorectification and spatial registration on a global reference system with sub-pixel accuracy (ESA, 2020)).

Training and Validation Datasets
The 1260 points that compose the training dataset were randomly selected. This process was sped up by randomly sampling (stratify samples) a training layer generated from already existing land cover classifications. This phase, besides accelerating the training points positioning, also ensured the minimization of the subjectivity in the identification of the training points (Gromny et al., 2019). The information of ESA High-resolution Layers (VHR) 10m (Forest cover, Imperviousness, Grassland and Water and Wetness, http://land.copernicus.eu/) (Lefebvre et al., 2016), CORINE Land Cover (CLC) (Copernicus, 2020), and RUSLE 2015 at 100 m from ESDAC (Panagos et al., 2015) constitute the training layer. The classed of CORINE land cover, as well as other input data, were reassigned based on the classes of interest, as Table 1 shows. Finally, the training points were visually checked. This phase was carried out outside GEE environment. The accuracy assessment is based on 1260 points identified using the same methods for the definition of the training points.

Satellite images dataset
The classification was realized on Q tile and then its replicability checked on tile K. The images regarding the entire Sentinel-2 activity of sensing were filtered according to two parameters: the cloud cover percentage, data must have less than 10% the scene covered by clouds; and by the sensing period. Only images sensed during summertime (from June to August) were selected to minimize the effect of the shadows. The filter reduced considerably the data available from Sentinel 2 level 2A (Sentinel 2 highest level of processing product that geometrically and atmospherically corrected): only 16 images that satisfied the filter criteria were available. Therefore, to ensure a larger classification dataset, 1C level data were used. (Table 2).  Table 2. Sentinel 2 level 1C images of the classification dataset.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) 31 tiles constitute the final dataset: 13 images for Q and 18 for K. Sentinel 1C level products are georeferenced and geometrically corrected, but the atmospheric disturbance and the topographicderived distortions are not rectified. Thus, the imagery was preprocessed before the classification.

METHODS
In this paragraph is described the methodology applied for this classification. It consists of a machine learning classification based on random forest algorithm, realized using 1260 training points on 9 classes: Coniferous forest, Broadleaves forest, Grasslands, Water, Clustered bare soil areas, Solid-rocky areas, Urban areas, Glaciers, and Agricultural lands (

Glaciers
Perennial snow/ice cover. In Alpine zone, only in areas above 3000 m a.s.l.

Agricultural lands
Areas interested by agricultural activities that require tillage, or present rows of fruit trees/bushes. Table 3. Classes of Land Cover considered in this work.

Pre-processing
Sentinel 2 1C-level products are georeferenced, but the atmospheric disturbance and the topographic-derived distortions are not corrected. Thus, the imagery was pre-processed phase preceded the classification and consists of the cloud masking, the atmospheric correction, and the topographical correction.

Cloud masking
The dense clouds and the cirrus were masked in each epoch using the QA60 band. The QA60 band is provided as an additional 1Clevel product and represents the cloud mask computed by ESA through the analysis of the blue spectral region (B2), the SWIR reflectance in B11 and B12 (ESA, 2020). Only cloud-free pixels were maintained, while dense clouds and cirrus were classified as no data values.

Atmospheric correction
The atmospheric correction of satellite imagery is considered fundamental in remote sensing applications, especially in the case of multi-temporal analysis (Hadjimitsis et al., 2004;Lantzanakis et al., 2017;Martins et al., 2017;Sola et al., 2018). The atmospheric correction removes the scattering effect of the Earth's atmosphere and it can be based on Radiative Transfer Models (Specific mathematical models that consider latitude, season and atmospheric conditions) or on Image-Based Correction Techniques (that estimate atmosphere scattering using information and data within the image) (Hadjimitsis et al., 2004;Lantzanakis et al., 2017;Martins et al., 2017). Usually, the Radiative Transfer Models are more accurate and therefore more applied. For example, ESA for the production of atmospherically corrected imagery of Sentinel 2 (level-2A), applies the Sen2Cor Radiative Transfer model. For the time being, there is no atmospheric correction model to be applied to Sentinel 2-level 1C implemented in GEE. This means that Sentinel 2 at 1C processing level cannot be easily atmospherically corrected using Radiative Transfer Models in GEE environments. This is a severe limit to the performing of multi-temporal analysis. To overcome this limitation, in this work, we applied a linear model for the atmospheric disturbance reduction: Dark Object Subtraction (DOS) (Chavez, 1988), which performs similarly to radiative transfer models on homogeneous surfaces such as grass, water, and bare soil (Lantzanakis et al., 2017). The application of the DOS did not negatively affect the final results because the method we propose classify separately each epoch of the dataset, as discussed in paragraph 3.3.

Topographic correction
The topographic correction allows the variation in the reflectance derived by the inclination of the terrain and the sun elevation (Poortinga et al., 2019;Shepherd and Dymond, 2010). This preprocessing phase is crucial in mountainous areas, because of the steep mountainsides and the consequent alteration of reflectance values. The entire dataset was corrected by applying a semiempirical correction. The code was originally implemented in GEE by Patrick Burns and Matt Macander, and then adapted to the dataset. The correction is based on a semi-empirical method that takes into consideration not only the topography of the area (as for the empirical methods), but also the solar angle (both zenith and azimuth) (Shepherd and Dymond, 2010). The topographic correction is based on sun-canopy-sensor with a semi-empirical moderator (c) (SCSc) method (Poortinga et al., 2019;Soenen et al., 2005). A Digital Elevation Models (DEM) to detect the slopes of the area and the solar position information (i.e. the sun inclination and sun irradiance) are the input data. These data are available from the metadata of the satellite images, while elevation information was extracted from the Shuttle Radar Topography Mission (SRTM) digital elevation data with 30m spatial resolution that is available in the GEE catalog (Farr et al., 2007).

Derivative features
Some additional bands were calculated to improve the final accuracy of the classification. The diversification of the input information is crucial for a good classification. For example, textural elements can facilitate Land cover classes discrimination (Lewiński et al., 2015), as well as the histogram-based ones (Drzewiecki et al., 2013). First, 10 radiometric features were added (Table 4). Secondly, 5 histogram-based features, 19 textural and 1 edge-detector features were computed. Particularly, the texture metrics from the Gray Level Cooccurrence Matrix in the 7x7 neighborhood of each pixel of band 8 (Near InfraRed, NIR) were computed (Table 4) (Conners et al., 1984;Haralick et al., 1973;GEE, 2020). The derivate features improved the goodness of the classification, but required high computational power. Indeed GEE exceeded the memory limit by running the entire code: the filtering of the Sentinel-2 data catalog, the topographic and atmospheric corrections of the filtered features, the derivative bands computation for each epoch and the classification itself. Thus, to slim out the classification process and to reduce the computational time of the classification a correlation analysis and the importance of the layer in the classification analysis were realized.

Correlation analysis
The slimming out workflow was designed to reduce the computational time without losing accuracy. First, a correlation analysis was performed between the radiometric derivative features and the original bands to avoid information redundancy. The correlation between bands was analyzed using July 2017 as reference. The analysis was based on the DN values randomly sampled within the classes. Band 4 was used as reference band. Correlation coefficients equal to 1 indicate total correlation. This analysis considered highly-correlated variables greater or equal to 0.85 (dark green in Table 5). Then, the correlation analysis between textural bands was carried out. Since the textural bands resulted to be much less correlated to each other compared to the radiometric bands, the importance of textural predictors in the classification was computed by running the classification each time excluding one different predictor. The Overall accuracies of these classifications were considered as the importance value of the removed predictor. This indirect strategy to estimate the importance of each feature within the classification was necessary because the evaluation of predictors' importance is not implemented in GEE. The importance analysis was performed on July 2017 tile. The OA of the classification computed with all textural features is 0.825. The features which exclusion caused the increasing on the OA of 0.01 points were considered as less important (negatively affecting the classification). To further reduce the computational effort, the bands were normalized and then transformed in integer values (int16). Since running the normalization on GEE required too much memory was realized a "pseudo-normalization". For each band was identified the multiplicative factor that allows to obtain integer values no bigger than 32767 (max values for signed integer data format). The pseudo-normalization does not affect the classification.

Classification
Each image was separately classified using the machine learning algorithm random forest with 50 rifle decision trees per class and 4 as the minimum size for terminal nodes. The same training dataset was used for each image. This means that for one image were sampled 37 DN values in correspondence of every training point. This datum was used to train the classifier and finally to apply it to the starting image. The results are 13 classifications that were aggregated according to the most frequent pixel value between 1 and 9 (no data values excluded) to obtain the final classification of the area. The aggregation allowed us to minimize the mistakes of the single classifications and to take out from the final result the no data values of the cloud masking (Nowakowski et al., 2017).

Accuracy assessment
The accuracy assessment is based on 1260 defined by the same method adopted for the identification of the training dataset. It consisted of the computation of the error matrix and the derived accuracy measures for each single classification and the final aggregated one. The error matrix-derived measures are the overall accuracy, the producer's accuracy, the user's accuracy, and the F1 score. Table 5 shows the results from the correlation analysis between radiometric-based derivative features. The correlation coefficient ranges between 0 (no correlation, light green) and 1 (total correlation, dark green), ( Table 5, Table 6). The radiometric derivative features with a correlation coefficient larger than 0.85

Classification
It was not possible to perform the entire classification in one GEE script. Thus, it was carried out using some expedients. First were performed the filtering and the correction. The pre-processed dataset was then exported in the GEE personal Asset. A new script was written for the computation of the classifications in which was imported the pre-processed dataset. The slimming out tests were realized in a separate script. Finally, the single classifications were converted into int8 data format, stacked in a single image and exported (Figure 2). Figure 3 provides some examples of the classification results in high-altitude areas (left) and lowlands (right).

Accuracy
The overall accuracy for single date classification varied from 0.661 to 0.747 in the case of only radiometric derivative features and from 0.791 to 0.900 for the radiometric and textural derivative features. Table 8 reports the results of the accuracy assessment deriving from the classification of i) the Sentinel radiometric bands and the radiometric-based features, and ii) the Sentinel radiometric bands, the radiometric-based features, and the textural features. The F1 scores of clustered bare soil areas along with the OA accuracy values rapidly increase. The classification with the derivative features shows the overall accuracy of 86%. The OA of 94% deriving from the dataset with radiometric and textural features is the result of the aggregation of 13 images.    Figure 4 reports the accuracy and the F1 score for each single classification. The accuracy values have very unstable results within the classification scene. This is partially due to the incorrect classification of the areas covered by clouds (no data) and the pixels in the immediate surroundings of the masked area that may suffer from radiometric alteration for cloud proximity, but not detected by Sentinel cloud masking. Figure 4. The changes in F1-score for each class through the epochs (tile no., on the abscissae).

Replicability
The replicability of the classification methodology was tested on the dataset of tile K and the overlapping area of tiles Q and K. The accuracy assessment from tile K Table 9 shows very similar results to the one of tile Q. It appears that on both tiles the classes that describe the mountainous areas have lower F1 value. The accuracy assessment based on 700 test points (placed ad hoc) was realized for the overlapping area, to check the stability of the classifications of tiles K and Q in most challenging areas (mountains peaks). Only 7 classes are present in the scene, but the rocky areas classes dominate the scene (Table 10). The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLIII-B3-2020-663-2020 | © Authors 2020. CC BY 4.0 License.

DISCUSSION
The filter for selecting only the summertime images with a very low percentage of cloud cover, on one hand, ensured uniform illumination and atmospheric conditions; but on the other hand, impeded the use of Sentinel 2 highest processing level (2A). Besides, the classification of the 1C-level dataset achieved interesting results. The independent classification of each image and the frequency-based aggregation minimized the distortions and the errors derived from the topography and the atmosphere. Moreover, the summertime filter reduced the variability in the meteorological and atmospheric conditions allowing the application of a linear model of correction. Even if the preprocessing phase is consistent, it had a low impact of the methodology thanks to GEE cloud computing and the ease with which it is replicable in GEE. Although, we faced some major constraints in GEE related to the memory available for single users, the poor information regarding the available functions and the lack of some useful functions (like layers importance). Nevertheless, it is a relatively young service that is constantly enriched with new functions and features and our analysis was quite ambitious from the computational point of view. Considering the results in detail, the derivative features considerably increased the goodness of the classification. It is interesting to note the role of textural bands in the discrimination of confusion between clustered soil class and solid-rocks class. Indeed, F1 scores of clustered bare soils jumped from 0.359 to 0.879 by adding the textural features to the classification (Table  8). Figure 4 clearly shows that the clustered bare soil and the solid-rocky areas are the classes with a lower F1 score in every single classification as evidence of the difficulty of separation of the two classes. Generally, the accuracy of a single classification shows a similar trend throughout the classes, such as image 6 (low F1 values) and classification no. 7 (high F1 values for all its classes) from Figure 4. Regarding the aggregated classification, the overall accuracy achieves 0.945 with a clear improvement also in clustered-rocky and solid-rocky areas (respectively F1scores 0.827 and 0.890) proving the validity of the aggregation method to reduce the main errors from the single classifications (Table 8). The replicability analysis on tile K reported in Table  9, show trends in clustered and solid rock classes close to the ones of tile Q. The overall accuracy of K is 0.907. The difference between K and Q overall accuracy can be caused by the training dataset and it is negligible. It is worth mentioning that in tile K there is one class more: glaciers. Indeed, the land cover by glaciers is relatively small and present only in K tile (in the Écrin National Park, FR). Even if the glaciers classification is very good some small glaciers were not detected. The overlapping area between K and Q provides promising results if we consider the fact that the most confuse classes clustered and rocky areas are the dominant LC (Table 10). On the other hand, the few classes present in the area provide better values of Overall accuracy. Generally, the solid-rocks are frequently misclassified as urban areas. This is attributed to the high spectral similarity of the classes and their similar textural characterization. A frequent error in Alpine areas is the classification of buildings with rocky roofs as solid-rocky areas, the same for the small rivers in which the water flow during the summertime is reduced. Also, some pixels classified as water are detected in mountainous areas, probably due to the presence of shaded rocky/snowed areas.

CONCLUSION
This study proposes a simple method for the classification of Land cover in mountainous areas in GEE platform, which includes some steps to reduce the computational effort and the time for the classification. The methodology minimizes the error introduced by the atmospheric component and the terrain inclination using only images captured during a short time range in limited cloud cover conditions and by applying atmospheric and topographic corrections. The textural derivative features played a key role in distinguishing the most challenging classes (clustered bare soil and solid-rocky areas). An additional positive value of the methodology is the aggregation method, indeed, by considering the modal value of the single classifications, the final accuracy significantly raised. All the aspects allowed us to reach good accuracies in mountain areas. Since the entire classification is performed in GEE, it can be easily modified and updated.