PROBABILISTIC VEGETATION TRANSITIONS IN DUNES BY COMBINING SPECTRAL AND LIDAR DATA

Monitoring the status of the vegetation is required for nature conservation. This monitoring task is time consuming as kilometers of area have to be investigated and classified. To make this task more manageable, remote sensing is used. The acquisition of airplane remote sensing data is dependent on weather conditions and permission to fly in the busy airspace above the Netherlands. These conditions make it difficult to get a new, dedicated acquisition every year. Therefore, alternatives for this dependency on dedicated airplane surveys are needed. One alternative is the use of optical satellite imagery, as this type of data has improved rapidly in the last decade both in terms of resolution and revisit time. For this study, 0.5 m resolution satellite imagery from the Superview satellite is combined with geometric height data from the Dutch national airborne LiDAR elevation data set AHN. Goal is to classify vegetation into three different classes: sand, grass and trees, apply this classification to multiple epochs, and analyze class transition patterns. Three different classification methods were compared: nearest centroid, random forest and neural network. We show that outcomes of all three methods can be interpreted as class probabilities, but also that these probabilities have different properties for each method. The classification is implemented for 11 different epochs on the Meijendel en Berkheide dunal area on the Dutch coast. We show that mixed probabilities (i.e. between two classes) agree well with class transition processes, and conclude that a shallow neural network combined with pure training samples applied on four different bands (RGB + relative DSM height) produces satisfactory results for the analysis of vegetation transitions with accuracies close to 100%.


INTRODUCTION
Nature is an integral part of our environment. In Europe, nature reserves are protected under the Natura 2000 program, (Sundseth and Creed, 2008). As part of this program, habitat development has to be monitored, (Ackerly et al., 2015). Given the vast size of typical nature reserves, remote sensing is an attractive option for monitoring. Remote sensing can be performed in dedicated campaigns, but this is expensive and often complicated to organize. Alternatively, readily available data could be used for monitoring.
The method traditionally used to monitor vegetation transitions in the area of interest is a model called DICRANUM, (Assendorp et al., 2010). This model is based on the red and near infrared (NIR) spectral bands of areal photographs. The red and NIR spectral bands were chosen because their ratio provides the most distinguishing information on vegetation. Classification classes range from bare ground with no vegetation to a class with 100% coverage with shrubs and trees. Between these pure classes, there are 5 fuzzy grassland classes with vegetation mixtures at the sub-pixel level.
In addition, training data is collected for both pure/crisp classes as well as fuzzy classes, (Tapia et al., 2005). The training data is used to identify both crisp and fuzzy classes in the 2-band Red-NIR feature space. The classification procedure results in 6 maps, one with only the crisp/pure classes and 5 with membership values for each of the fuzzy classes. The so-called membership value gives the probability that a pixel belongs to a fuzzy class. A pixel may belong to several fuzzy classes. * Corresponding author This fuzzy classification is well suited for vegetation monitoring, (Feilhauer et al., 2021, De Lange et al., 2004. The strength of this approach is its ease of use, and the high accuracy for the crisp classes. However, the disadvantages of the model are the limited information (2 bands) that is used from the input data, as well as the need to acquire field observations to characterize the fuzzy classes in the 2-band feature space.
Given the difficulties to organize the dedicated campaigns, our goal was to analyse to what extend similar or even better results can be obtained from readily and freely available remote sensing products, in combination with state of the art classification techniques that are able to profit from the full bandwidth of available information.

Area of Interest
The area of interest consists of the Meijendel and Berkheide dunes, compare Figure 1. It is situated at the Dutch coast between the cities of The Hague and Katwijk. The area has a size of 2877 hectares, the southern part is called Meijendel and is the larger area at 1951 hectare while the northern Berkheide is 926 hectare. This Natura2000 area consists of a varied and extensive dune landscape, and is relatively rich in relief.

DATA
The remote sensing data considered for this study should be ready to use and relatively up to date. In addition, data should be useful for vegetation characterization. Therefore, it was decided to combine high resolution multi-spectral satellite data with freely available airborne laser scan data, (Kukunda et al., (Mücher et al., 2015). The spectral data is expected to enable us to distinguish different types of vegetation from notably sand in this area, while airborne laser scan data should be useful in distinguishing high vegetation from terrain. Additional, freely available aerial photos were used for visual inspection. A summary of the used data sets is given in Table 1.

Superview
The Superview satellite mission was launched in 2019 and creates high resolution imagery, (Liu et al., 2020). The data set is provided as a raster, with a ground sampling resolution of 0.5 meters. The imagery contains four bands, with reflectance information in the red, green, blue and a near-infrared bands, (Mozgovoy et al., 2018). The satellite data from the Superview platform, is bought about 6 times a year by the Netherlands Space Office and made available for use by Dutch entities. As this data set is bought for whole swaths of the Netherlands, not all data points are over the area of interest, or of sufficient quality (e.g. cloud cover). This results in about 3 to 5 usable images per year, slightly more often in the summer months. An overview of the Superview images used in this study is given in Table 2. One such image is shown in Figure 2, left. A zoom-in at pixel level is shown in the inset.

Actueel Hoogtebestand Nederland (AHN)
AHN is a Dutch nation wide elevation model produced using airborne LiDAR, (Van Natijne et al., 2018, Soilán Rodríguez et al., 2019. The elevation model in the raw form is a point cloud, however, for this study the rasterized 0.5 m grid is used. The raster comes in two versions: a terrain model and a surface model. The difference ∆H at a 1m raster between the mean of four terrain heights (at 0.5 m raster) and the mean of four surface heights (also 0.5 m raster) can be seen as a proxy for vegetation height and is used as input for the proposed classification work-flow. Figure 2, right, visualizes the AHN surface elevations over the same area as shown in Figure 2, left. In this study AHN4 data was used, that was acquired in early spring 2020.

Class definition and Training data
The three pure classes considered here are Sand, Grass and Trees. For each of these classes training data was identified for 30 areas of 10 by 10 meters where these classes are found throughout the whole 3 years of the Superview-1 data availability. These 90 (3 × 30) areas were validated using the high spatial resolution aerial photos.

METHODOLOGY
The classification methods considered are nearest centroid, random forest and neural network classification. These methods vary from easy to understand, but less flexible, to state-of-theart models that are more difficult to tune. These classification methods are used to produce several vegetation assessment products. Their products are also used to compare and validate the models.

Nearest centroid
The first model is the simplest of the models considered, as it only involves one distance per class for each pixel to be classified, (Gou et al., 2012). The first step is to find the centroid, or mean, of the features of the training data of each target class, so there are as many centroids as there are classes. The construction of these centroids is a simple arithmetic mean, which computational effort scales linearly with the amount of training points (Schütze et al., 2008).
To get a classification for a pixel p, the Euclidean distance of the features of pixel p to each centroid ci, i = 1, 2, 3 is calculated in feature space. The centroid at smallest distance has the highest probability, and is assumed to correspond to the class the pixel belongs to. In addition, a probability P (Ci, p) for class membership of pixel p to each class Ci is obtained by Eqn. 1 with: di = distance d(p, ci) to centroid ci of class i The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France

Random forest
The random forest model (Breiman, 2001) is a method that combines multiple decision trees into an ensemble. One such decision tree makes binary choices in feature space to get the best splits. The tree leafs correspond to the target classes. The best split is identified by minimizing the Gini impurity, which is a measure to quantify the quality of a split, (Breiman et al., 1984). Correlation between different decision trees is decreased by: (i) using only part of the training data for building one tree, and, (ii) by also using only part of the features for building one tree.
To classify an unseen pixel, its features are run through all, 100 decision trees of the random forest. The pixel is assigned to the class which most trees vote for. In addition, the percentage of trees voting for a class is interpreted as the probability that the pixel belongs to that class.

Neural network
A Neural network is a type of machine learning model, based on the concept of how neurons in brains learn. It is one of the most advanced classification methods available. These models consist of at least three layers, the input layer, one or more hidden layers and one output layer. Each layer consists of a number of neurons (or nodes) which are connected to all or some of the neurons from the layer before and after. These connections all have a modifiable weight (or strength). These values will be estimated during the training of the model by minimizing a suitable loss function in an iterative way, (Wang, 2003, Bishop andNasrabadi, 2006).
The number of input nodes is equal to the number of data sources that are put into the model, which is five in our case, RGBI + ∆H. The part with the hidden layers is where the model does the work, and the number of layers, the number of nodes in each layer and an activation function need to be determined. Our model consists of one hidden layer of 14 neurons. The output of a node is determined by the non-linear activation function, that scales the weighted sum of each input connection. There are several possibilities for this activation function, including the sigmoid, the arc tangent and hyperbolic tangents functions. In our case, the number of outputs will be equal to the number of classes, and a softmax activation function is used, which is considered best for categorical outputs.
The output layer consist of three 'class' neurons, one for Trees, Sand and Grass.
To train the neural network, the weights of all the connections between the nodes have to be optimized. This requires a large training data set, which the model will use to find relations between the input layers and the output class in the training data. The progress is evaluated by a loss function that quantifies the difference between the neural network output and the training data. The loss function used is categorical cross-entropy loss. The final output will produce a value at each output node that is interpreted as the probability that a previously unseen pixel belongs to that class.

RESULTS
Using all three classification methods, a land cover classification and a probability map were created for each of the RGBI Superview-1 images in Table 2. Input in all epochs was the latest Superview-1 RGBI image, at 1 m resolution, plus AHN4 derived vegetation height ∆H. Note that ∆H is available from a single acquisition only, and thus does not change over time.
Known locations with water and buildings were masked out using a static mask based on the national topographic map. This multi-epoch classifications also results in a land cover timeline.
Here, only the classification results based on the Superview data of 2021-09-07 will be shown in combination with the ∆H height. This Superview data has the best quality from the recent imagery, while the AHN4 data acquisition time of 2020 is not too far away.
In Section 4.1 the Neural Network classification results will be presented, followed by single pixel probabilities in Section 4.2. Class variations through time will be shown in Section 4.3, while class transitions will be showcased in Section 4.4. Some results of Nearest Centroid and Random Forest will be discussed in Chapter 5.

Neural network classification result
The classification result of the neural network, implemented using TensorFlow (Abadi et al., 2015), is shown in Figure 3. Here, the left image shows the final class labels, while the right image also visualises the class probabilities. The overall map looks as expected, with sandy patches closer to the sea at the west and  more trees inland, i.e. the east part of the area. The confusion matrix in Table 3 also shows that the testing data shows very good agreement with the training data, with accuracies between 97% and 100% for all classes. This agreement is expected to be lower near class transitions, due to mixed pixel effects, where one pixel contains vegetation from several classes, but also because of gradual vegetation transitions in the field, for example sand, mixed with small patches of vegetation.

Probability triangle plot
As indicated in Section 3.3, per pixel probabilities of each of the three classes, Sand, Trees, and Grass, are also saved. The resulting probabilities for the Neural Network classification are shown in Figure 4. In this scatter plot each classified pixel, p, is positioned according to each three probability values, p1 for Trees, p2 for Sand, and p3 for Grass. At the vertices of the triangle, pixels are located with 100% probability for one class. In general, a pixel, p, is positioned in the probability triangle at position tp according to its barycentric coordinates, (Möbius, Unknown Sand Grass Trees Sand-Grass Grass-Trees Sand-Trees 1827), as indicated in Eqn. 2.
In Eqn. 2, the symbols T , S and G refer to the positions in Figure 4 of the vertices corresponding to pure Trees, Sand, and Grass respectively, while Eqn. 3 expresses that total probability equals 1.
The probability plot in Figure 4 is subdivided into seven polygons. The three triangles in the corners contain the pixels with a dominant probability of at least 70%, while the 4-gons aligned with the edges have a low probability, (<15%), for the opposite class. These 4-gons could also be seen as fuzzy transition classes, like 'Sand-Grass'. The triangle in the middle contains pixels with no dominant probability, (not above 85%), for any of the three classes.
The small gray triangle at the top left of Figure 4 shows which percentages of pixels fall within each pure class or transition class. Most pixels (both over 40%) are classified as Grass or Trees, while only 4.6% of the pixels is classified as sand. The transition class Grass-Trees also receives 3.8% of the pixels. The unknown class in the middle contains only 0.1% of the pixels.
The location of some of the pixels belonging to these fuzzy transition classes is shown in Figure 5. This figure contains a zoom-in of the neural network classification results. Indeed, as expected, transition pixels, like 'Grass-Sand' are found on the borders where Grass and Sand meet. This indicates that the transition classes actually show transitions and not pixels that are classified wrongly.

Class distribution through time
The timeline in Figure 6 shows the percentage of pixels per pure and fuzzy class over the area as a whole for each of the 11 Neural Network classification results of the Superview-1 images enriched with AHN4 height, as indicated in Table 2. The results show some consistency over time, with Grass always as the biggest class, followed by Trees. The class Sand is comparable in size to the fuzzy class Grass-Trees, while the other two fuzzy classes Sand-Grass and Sand-Trees as well as the Unknown class only have small percentages of pixels. Further analysis is required to understand the variation in percentages in Figure 6, which could be caused by seasonal influences for example, as vegetation is more abundant in summer, while, in addition, there are different seasonal patterns for different types of vegetation.

Case Study Berkheide
Larger and sudden changes are easily picked up by the Neural Network classification results. This is demonstrated in Figure 7, which shows class transitions at a known construction site. In this case a whole area in the Berkheide area has been cleared of bushes and trees to create a new region for water infiltration. This project started at the end of October 2020, (Spierenburg, 2020), as can be seen in the timeline on the top left of Figure 7.

DISCUSSION
In this discussion we cover three topics, first classification scope in Section 5.1, followed by a discussion on the results of the other two methods in Section 5.2. This chapter is concluded by a discussion on the probabilities obtained by these two methods in Section 5.3.

Classification scope
These classification methods were specially designed for the vegetation in coastal area's. Given the success of the classification, it is expected that more classes could be extracted from the data, e.g. the Trees and Grass classes could be further specified towards individual species. Extra classes would however require additional training data, and would increase the computational efforts of training the system. Water could be made into a class, as water presence is varying throughout years and seasons. However, including a water class or other classes might worsen the accuracy of the vegetation classification which is our priority.

Nearest Centroid and Random Forest classifications
The other classification methods tested were nearest centroid and Random Forest classification. The nearest centroid results show the limitations of this method. While it is fast, requiring 40 seconds per time step, the accuracies were lowest at 95% and by relying only on the distance to the closest class centroid, it is apparently difficult to distinguish grass from trees, due to the proximity of their centroids. As a result, much more grass is found than with the other methods. The Random Forest method produced results which are similar to the neural network results, with similar high (>97%) accuracy. However the Neural Network model is better suited for working with large datasets and is computationally faster than the Random Forest classification. The Neural Network needs about 100 seconds per time step, while the Random forest needs about 210 seconds. The major difference is in the class transitions, where Random Forest has low bushes included in the tree class, while the neural network classifies these as part of the grass class. A main advantage of the Random Forest method over the neural network is the ability to analyze exactly how the method makes its decision.

Probabilities, Nearest Centroid and Random Forest
The scatter plots of the pixel probabilities are interesting as they are very different for the three methods considered. In addition to the Neural Network scatter plot, Figure 4, scatter plots of pixel probabilities are given in Figure 8 for the Nearest Centroid results, left, and for the Random Forest results, right.
The Nearest Centroid scatter plot in Figure 8, left, shows a smooth pattern connecting all corners, but leaving large parts of the probability space systematically blank. Reason for these empty parts is that the probabilities are based on distances in feature space: if a feature coincides with a class centroid, it will be in one of the vertices of the probability triangle; if, on the other hand, it does not coincide, it will have non-zero distance to all three centroids and therefore stay away from the triangle edges. The scatter plot is slightly shifted towards the Grass-Trees side, however all pixels have positive probability for each pure class. 16% of the pixels is located in the middle unknown part while a large amount of 63% belongs to the Grass-Trees class, again, because the class centroids of the training samples of Grass and Trees are close in feature space. For this method, probabilities of one are reached only once the feature vector of a pixel coincides with the centroid of a training sample.
A contrasting pattern is observed for the Random Forest probabilities. Here the probabilities form a linear or discrete pattern caused by the fact that probability is always a number of trees. So if none of the trees vote for a class, the pixel will fall on the outer edge of the triangle. In this case there are slightly more points in the mixed classes than in case of the Neural Network example.
Overall, the probability scatter plots helps understanding the properties of different classification methods, while mixed probabilities may correspond to transitions between classes in practice. Note that none of these methods were specifically designed or trained to produce fuzzy results, here we merely grasped the opportunity to analyze outcomes in this direction.

CONCLUSION
This study shows that dunal vegetation monitoring is possible from readily available remote sensing sources. We showed that using a combination of satellite spectral, and aerial LiDAR data the vegetation can be classified in three major classes: trees, grass, and sand, as well as their transition zones. The satellite spectral data used has sub-meter spatial resolution, comparable to the dedicated surveys currently in use, and is therefore suitable for detailed vegetation assessment. Main benefit of using satellite observations over dedicated aerial surveys is, other than the reduced costs, a temporal resolution of months instead of years.
The new, higher, temporal resolution introduces new requirements on the acquisition of training data and ground truth data.
To mitigate the need for new ground training points for every date, training should only be done on places with a homogeneous area where only a single class is found. Our experiments show that fuzzy classes can still be estimated, and that vegetation transitions are correctly identified.
vegetation mapping strategy. Furthermore, the authors would like to acknowledge the Netherlands Space Office for providing the Superview data.