DEVELOPING TRANSFERABLE SPATIAL PREDICTION MODELS: A CASE STUDY OF SATELLITE BASED LANDCOVER MAPPING

The mapping of environmental information based on remote sensing requires a workflow that involves image processing, model training usually based on machine learning, as well as model application and validation. Remote sensing data processing capabilities are nowadays simplified by cloud computing platforms. State of the art machine learning methods for spatial data which involve a reduction of spatial overfitting, handling of extrapolation situations and a spatially explicit error assessment, however, are currently mostly implemented in local computation frameworks. Here we present a workflow that combines the improved processing capabilities of the cloud computation platform openEO with state-of-the-art machine learning model development in R. OpenEO is used for standardized imagery acquisition and preprocessing to provide predictors for model training. To reduce overfitting, predictors which are meaningful for the mapping are identified via spatial variable selection as implemented in R packages. The mapping accuracy is assessed via spatial cross-validation and predictions are limited to the ’Area of Applicability’ of the model. The workflow is designed to enhance and assess the spatial transferability of machine learning models which is demonstrated by a case study of a landcover classification based on Sentinel-2 imagery.


INTRODUCTION
Machine Learning (ML) models and their associated predictions have become a key component in environmental science to contribute to major contemporary challenges like achieving sustainable development goals (Holloway and Mengersen, 2018) or biodiversity monitoring (Reddy et al., 2021). Especially in the field of remote sensing ML emerged as a indispensable tool for the large-scale mapping of environmental information (e.g. soil properties, (Hengl et al., 2017), species occurrence, (van den Hoogen et al., 2019), or landuse (Venter and Sydenham, 2021)). Most mapping studies follow a similar logic. A ML algorithm learns the statistical relations between the target variable and predictors from the location of available reference data. Once trained and validated, the model is applied to the spatially continuous predictors to map the target variable for the entire area of interest. While the modelling might be straightforward, in practice, the development of ML models from remote sensing data faces several challenges.
Reference data are often heavily clustered in geographic space (e.g. due to opportunistic field-sampling campaigns (Yates et al., 2018)) which bears the risk of training spatially overfitted models and a low ability of the model to make predictions for new areas (i.e. low transferability Meyer et al. (2019); Meyer and Pebesma (2021)). Hence, sub-optimal validation strategies for spatial predictions can lead to incorrect conclusions about the statistical model performance (observed by e.g. Roberts et al. (2017); Meyer et al. (2018); Ploton et al. (2020)) which ultimately leads to incorrect conclusions about the mapping error. Spatial cross-validation strategies are therefore proposed to assess model performances at unknown locations instead of commonly used random cross-validation approaches (Meyer et al., 2018;Ploton et al., 2020;Mila et al., 2022). * Corresponding author While changing the cross-validation strategy allows for a more reliable error assessment, it does not solve the problem of overfitting. Reducing the risk of overfitting usually involves some form of predictor selection (Ying, 2019). Simpler modelsi.e. models that utilize fewer predictors to map the target variable -can be assumed to represent more generalized relations. Meyer et al. (2018Meyer et al. ( , 2019 have suggested a spatial variable selection to identify only those predictors that are most useful for a spatial prediction task, usually leading to higher prediction performances when the model is transferred to new locations. This model transfer often requires that predictions are made for environments that are different from those used for model training. Machine learning models, however, can only make reliable predictions for new areas if the values of the predictor variables involved are comparable to those encountered in the training data. To assess the area to which this applies, Meyer and Pebesma (2021) recently suggested a method to compute the "Area of Applicability" (AOA) of prediction models and suggest that this should become common practice for spatial predictive mapping.
A second major challenge is that ML models require large amounts of training data in order to adequately learn (nonlinear) relations between the target variable and predictors. Moreover, the model is usually applied to an even larger amount of data for the desired prediction (e.g. Europe wide Sentinel-2 imagery, Venter and Sydenham (2021)). This is especially the case for the development of global maps such as in Ma et al. (2021);Hengl et al. (2017); Moreno-Martinez et al. (2018) or van den Hoogen et al. (2019). Naturally, using ML in the context of remote sensing depends on large amounts of earth observation data which usually requires extensive preprocessings such as atmospheric correction, cloud masking or the computation of a composite. Especially if imagery from multiple time periods or different sensors are required for the mapping, the amount of data and computation resources quickly exceeds what most researchers have available on their local machine.
Ongoing improvements in spatial, temporal and spectral resolution of satellite imagery with a nearly global coverage further increases the amount of computation needed.
To approach this problem, cloud computing frameworks (e.g. Google Earth Engine (Amani et al., 2020) or openEO (Schramm et al., 2021)) emerged as a promising possibility to process large scale satellite imagery. In general, the idea of these platforms is to provide products and processes for analysis ready earth observation data without the need of downloading the satellite imagery. As a consequence, mapping studies utilizing cloud computing have seen a rise in popularity leading to attempts of global scale predictions of e.g. soil nematodes (van den Hoogen et al., 2019), plant biomass (Ma et al., 2021) or landcover (Venter and Sydenham, 2021).
Automated cloud computing workflows are in development to increase the accessibility of ML based mapping to users who lack a deep understanding of the underlying methods . While cloud computing platforms provide implementations of ML algorithms, the possibilities for adequate model development and validation are currently still limited due to lacking features (e.g. variable selection) that are, however, relevant for spatial mapping. This is not an issue of the platforms itself since they are meant for the processing of earth observation data and not the development of ML models. Consequently, the initial development of the ML model is often done locally (e.g. in van den Hoogen et al., 2019) with an established ML framework in Python (sklearn, Pedregosa et al. (2011)) or R (caret, Kuhn (2008)). This way, the model training and validation can incorporate already established methods to overcome the aforementioned challenges of spatial predictive modelling.
Here, we outline a workflow that utilizes the open-source cloud computing platform openEO for standardized earth observation imagery acquisition in combination with an R based ML model development that is designed to improve the spatial transferability of prediction models. By doing so, we combine the computational advantage of openEO with the possibilities of case specific model development and validation strategies. The usage of the workflow is demonstrated with a case study of a landcover classification of Sentinel-2 imagery. We explicitly show the benefits of spatial variable selection and the need for a transferability assessment with the AOA by applying the model to a different region.

METHODS
The idea of the suggested workflow is to utilize the benefits of the cloud computing platform openEO for the processing and acquisition of earth observation data with advanced predictive modelling methods provided by R (Fig. 1). Potential predictor layers are first computed only for areas with available training data. We then use spatial variable selection (Meyer et al., 2018(Meyer et al., , 2019 in order to find a set of predictors that, in combination, are most suitable to map the target variable beyond the training data locations. Model performances are validated with spatial cross-validation. Only the selected predictors that are regarded as relevant by the spatial variable selection are then acquired via openEO and used for mapping the area of interest. To prevent low quality and invalid model extrapolations, predictions are finally limited to the AOA of the trained model (Meyer and Pebesma, 2021).

Acquisition of predictors with openEO
OpenEO is an emerging cloud computing platform that aims at harmonizing the access to earth observation data from different providers (Schramm et al., 2021). Following an opensource paradigm, openEO enables a community-driven, transparent and reproducible alternative to closed-source alternatives such as Google Earth Engine™. Using the openeo R client (Lahn, 2021) we developed a processing chain for the acquisition of analysis-ready Sentinel-2 L2A composites. Users define the area of interest and a time interval for which the median composite of all available Sentinel-2 scene with less than 20% overall cloud cover is computed. In addition, the Sentinel-2 Scene Classification Layer (SCL) is utilized to mask remaining clouds, shadows and low quality pixels in each time step. The resulting composite may then be used for the calculation of vegetation indices as additional potential predictor variables.

Modelling
The acquired predictors are then matched with the available reference data that contain the information of the target variable and serve as the training data of the ML model. We use an R-based modelling framework consisting of a spatial variable selection, training of a random forest model, prediction and the computation of the AOA (Fig. 1). However, ML models can be very case specific since the quality of the outcome is heavily dependent on the quality of training data, the used algorithm and its parameters or tuning (Maxwell et al., 2018). Further, each modelling task aims at different target variables, deals with different spatial units and might require different preprocessing steps. Hence, the framework is flexible enough and can easily be modified to the needs of a specific modelling task.
2.2.1 Spatial cross-validation One main challenge when dealing with machine learning models is the prevention of overfitting (Ying, 2019). In the geo-spatial context, this means that the model has to be generalized enough to make valid predictions for new geographic locations. Hence, the model evaluation also has to account for the situation when the model is applied to locations that are not present in the model training.
To do so, we use a spatial cross-validation approach for the assessment of the model error (Meyer et al., 2018;Ploton et al., 2020).
Spatial cross-validation and subsequently the spatial variable selection requires the definition of suitable spatial units used to define cross-validation folds. There is an ongoing discussion on how these spatial units should look like. Here we suggest that users of the workflow should strife for an optimal method for their specific case (e.g. discussed in Mila et al., 2022;Meyer and Pebesma, accepted).

Spatial variable selection and model tuning
The risk of overfitting is greatly increased if the model utilizes a high number of predictors (Hassine et al., 2019), since this leads to a high probability that new locations contain combinations of predictor values that are not similar to the training data. Eliminating irrelevant predictors is further advantageous for a models computation time and interpretation (Maxwell et al., 2018).
In the context of spatial mapping, an adequate feature selection limits predictors to those that can be meaningfully used to make predictions for new geographic locations (Meyer et al., 2019;Le Rest et al., 2014). We therefore use spatial cross-validation in conjunction with a forward variable selection approach as described in Meyer et al. (2018). By doing so, we select predictors based on the described cross-validation strategy. Predictors therefore get automatically reduced to those that lead to the highest performance when making predictions for new regions. We assume that this set of predictors should also minimize extrapolation situations since the reduced feature space leads to a more generalized representation of the environment.
The resulting set of spatial predictors is used to train a random forest model (Breiman, 2001) as implemented in ranger (Wright and Ziegler, 2017). We use caret (Kuhn, 2008) for hyperparameter tuning in a grid search approach. Again, spatial cross-validation is used to determine the optimal set of hyperparameters (Schratz et al., 2019) namely mtry, minimum node size and the splitrule. The model internal variable importance is computed by random permutation of each individual predictor and measuring the effect on the model outcome. The tuned model can then finally be applied to new areas to predict the target variable. For this, the aforementioned openEO process is used to obtain the selected predictors for the entire area of interest.

Area of Applicability
Predictions in novel geographic areas might require model extrapolation if predictor values differ from the training data. This, however, is technically possible but not meaningful for random forests and similar algorithms. To detect these areas, we limit the prediction to the Area of Applicability (AOA) of the model according to Meyer and Pebesma (2021). The AOA is estimated for each pixel by calculating a "dissimilarity index" (DI). The DI of a new location is its Euclidean distance to the nearest training data point in the multidimensional predictor space, with predictors being weighted by their respective importance in the model. The AOA is then derived by applying a threshold on the DI. The threshold is the (outlier-removed) maximum DI of the training data derived from the spatial crossvalidation. Hence, a new data point is outside of the AOA if it is more dissimilar in its predictor properties than the dissimilarity observed in the training data set.

Case study
To demonstrate the proposed workflow we applied it in a typical satellite-based landcover classification (LCC) scenario. We choose the case study of a LCC since landcover is one of the most important drivers for environmental processes and also widely used as a predictor for subsequent modeling. Further, the reasoning and effects of the predictor selection and AOA are depicted very clearly when applied to the use case of landcover mapping. For example, the spectral properties within certain classes might differ (Hermosilla et al., 2022), i.e. a deciduous forest in Germany might look different from a deciduous forest in Italy. Further, certain landcover types might be completely missing in the training set, leading to the relevance of accounting for the AOA, especially when a trained model is transferred to new geographic regions. This also holds true for models that are applied to different time periods. Spectral properties of landcover classes (e.g. deciduous forests) strongly depend on the time of observation and a model trained with imagery from spring might fail if it is faced with spectral data from a summer scene. This further supports the need for a standardized earth observation data acquisition workflow since predictors in the area of interest have to undergo the same processing steps as the predictors used as the training data.
We collected 40 reference polygons for each of the landcover classes agriculture, forest, grassland, roads, settlement and water in the German state North Rhine-Westphalia (Fig. 2). Additionally, we collected 20 reference polygons per class from three geographically distinct areas that are used for independent validation indicating the transferability of the model. One of the new regions is a coastal area on the Eiderstedt peninsula in northern Germany. Here, the additional class "sand" was sampled which was not present in the training set.
For the training areas, we acquired the Sentinel-2 median composite from all scenes between 2021-04-01 and 2021-10-01 (Bands 2,3,4,5,6,7,8,8A,11,and 12,NDVI and EVI) with a spatial resolution of 10m. Bands with a native resolution other than 10m were resampled. As a service provider we use the VITO backend through the openEO Platform early adopters program. For model training and testing, we sampled 2500 pixels per class from the Sentinel-2 composite at the training and test polygon locations respectively.
As spatial cross-validation strategy that was used during hyperparameter tuning and variable selection we applied a 20-fold "leave-polygon-out" cross-validation, where all training polygons were randomly divided into 20 groups. Hence we avoid that reference pixel in the training and test sets are in spatial proximity since they stem from different polygons.
We trained a random forest model that utilizes all 12 acquired Sentinel-2 predictors for the landcover mapping and trained a second model using spatial variable selection. We compared both models in terms of classification accuracy using the independent test data in the three regions (Fig. 2) as well as the transferability of the model using the method to estimate the AOA. The code to reproduce the presented case study can be accessed at: https://github.com/LOEK-RS/ISPRS2022LCC. The methods for spatial variable selection and AOA estimation are implemented in the R-package CAST (Meyer and Ludwig, 2022).

Effect of Spatial variable selection
The spatial variable selection identified bands 3, 4, 11 and 12, along with NDVI as relevant to predict the landcover classes with a spatial cross-validation accuracy of 0.83. Any further predictor variable could not increase the accuracy. The model that includes all 12 predictors led to nearly the same spatial cross-validation accuracy (0.84, Tab. 2). This marginal difference of the cross-validation accuracy is arbitrary since it stems from the randomness during model training. Thus, the reduction of predictors did not harm the model's ability to accurately fit held-back training data. Further, the prediction outcome of both models was identical for 96% of all classified pixels. Hence, there is no benefit of including more predictors than needed in the model. We observed slight confusions between the classes grassland, agriculture and settlement, (Tab. 1), which is expected as these classes share similar spectral features. Arguably, the within-class spectral properties of these classes is also diverse which makes it more difficult for the random forest model to define characteristic split rules. The validation with independent test data of geographically distinct areas revealed that the variable reduction increased the accuracy of the predictions from 0.71 (all predictors) to 0.77 (reduced predictors). This indicates that the spatial variable selection improved the transferability of the model, since the prediction accuracy in regions without training data increased. As a positive side-effect, the reduced number of variables led not only to a higher prediction accuracy but also to reduced computational requirements since only the selected variables had to be processed in openEO Platform. The smaller predictor space also drastically reduced the computation time of the AOA. Applying the model to the coastal area resulted in the LCC depicted in Fig. 3. By comparison with the independent test data, we observed slight confusions of the classes agriculture and grassland which was already observed during spatial crossvalidation (Tab. 1). The main observations in Fig. 3 however are the large settlement areas predicted near the coastline, which are evidently erroneous classifications of the beach visible in Fig. 4. Since the class "sand" was absent in the training data, the model can never predict these cases correctly. The model accuracy alone (or any model related quality metric) gives no guidance about such cases where the model was not able to learn about a certain class. Hence, the overall model accuracy -even from spatial cross-validation -is not sufficient to represent the mapping accuracy. In this case study, we can identify the incorrect classifications by using the test polygons from the area of interest. Independent and well distributed test data, however, are most likely not available for many mapping projects since the sole purpose of developing a spatial prediction model is its application to areas were no data are available.
In a more complicated prediction task than a LCC (which can still be evaluated visually), noticing such prediction errors is challenging. Here, the AOA of the model can serve as a tool to identify possibly erroneous predictions based on distances in the predictor space.

Area of Applicability comparison
The AOA of the model indicates locations where the predictor values are similar enough to the predictor values of observations in the training data. We computed the AOA for both models for the three test regions (compared in Tab. 2). The AOA of the model using all 12 spectral variables allowed avoiding 25% of incorrectly classified pixels (as estimated using the test data). Only a small amount of correctly classified pixels (4%) were outside of the AOA. Reducing the amount of predictors with spatial variable selection allowed avoiding 35% of false classifications by limiting predictions to the AOA. Only marginally more correctly classified pixels were outside the AOA of the simplified model (7%) compared to the full model (4%).
Assessing the transferability of the simplified model in the coastal area shown in Fig. 4 results in the AOA depicted in Fig. 5. Besides some minor patches in the agriculture / grassland areas, the entire coastline is outside the AOA. This can be expected since the coastline consists of spectral properties not present in the training data. From the pixels that are declared as "sand" in the independent test polygons, the AOA was able to mask off 76%. This is a major improvement compared to the 37% masked off sand pixels from the AOA of the model without spatial feature selection (Tab. 2). The beach areas are not applicable for both models. The AOA of the simplified model also shows ambiguous shallow areas as not applicable that are declared as "sand" in the training area, but predicted as water by the model (Fig. 5).
This case study shows that defining and visualising the AOA for a spatial prediction model is a useful tool to prevent low quality predictions. The AOA is therefore a crucial part of the spatially explicit mapping error estimation since it can depict where the estimated model performance can be expected to hold because the model was enabled to learn about such environments. The AOA can further give new insights on missing training data and where new sampling are required to adequately represent the entire prediction domain.

CONCLUSION
Spatial predictive modelling heavily benefits from novel methods that are so far developed for local use only. The results of the case study e.g. show the benefits of spatial variable selection and consideration of the AOA to increase and assess the transferability of predictive mapping models. We therefore regard the combination of cloud based earth observation data processing and local model development with established frameworks as currently the best compromise to produce high quality spatial prediction models. Our presented open source workflow streamlines the access to satellite based training data for the purpose of model development. In a next step, the locally developed model should be re-implemented or used directly in the cloud computing platform. In openEO Platform, this functionality is currently in development. Besides shareable and reproducible access to homogenized satellite data, the open source aspect of openEO Platform will also enable the implementation of the AOA in cloud environments to further reduce computational costs for users and enhance the spatial mapping workflow overall.

ACKNOWLEDGEMENT
The work was supported by the Federal Ministry for Economic Affairs and Climate Action of Germany (project number 50EE2009). We further want to thank the openEO Early Adopters Program for the help and free access to the openEO Platform for cloud computing.