DTM CORRECTION IN AREAS OF STEEP SLOPES

Computation of a DTM from a DSM is a well-known and very important task. We derive the DTM by a procedure consisting of ground points extraction, surface interpolation and triangulation by a canonical mesh if the terrain is flat or has only moderate changes in elevation. In regions with steep slopes, such as at riversides, and with man-made 3D structures, such as around bridges, interpolation artifacts and suppression of high-resolution details can lead to coarse errors in local elevations even for the building detection task. The eligible regions must be therefore detected and at least locally reprocessed. For detection, we search for connected components of a certain minimum size with negative relative elevations. For reconstruction, we suppress the points with erroneously reconstructed DSM values and interpolate the surface by means of L1 splines. Finally, these meshes must be fused into one single DTM mesh. We applied land cover classification to demonstrate the usability of our correction. The overall accuracy amounts to around 88% while the number of faulty assignments due to incorrect DTMs can be significantly reduced.


Motivation
Accurate Digital Terrain Models (DTMs) are required in many applications. For instance, DTMs are important for urban planning or navigation tasks. Regarding applications for land cover classification, features derived from a DTM can provide information about building positions, roads and natural areas. Sources of systematic errors in DTMs must be eliminated in order to avoid mis-classifications. Many authors (Bulatov et al., 2019, Häufel et al., 2018, Huang et al., 2015 presented approaches of training data acquisition for land cover classification. For example, in (Bulatov et al., 2019), segmentation results are assigned to one of four classes (building, grass, road, and tree) using cascades of simple features followed by an interactive verification step, which is not too time-consuming if only larger segments are considered. Among simple features used to differentiate between mentioned classes, especially relative elevation plays a crucial part. Local heights are derived by subtracting the DTM from the DSM and are needed to separate elevated objects from the ground, and further on differentiation between different types of elevated objects. Procedures commonly used for determination of the DTM lead to wrong elevation values. This effect is caused by the impact of filtration and methods of interpolation.
In the following section (Section 1.2), an overview of related work will be presented. In Section 2, we describe how the initial DTM is computed and in what way areas in the initial DTM are detected exhibiting incorrect elevations. After the detection, specific parts of the DTM will be processed separately concerning surface interpolation and the reinsertion into the DTM which is part of Section 2.3 and Section 2.4. In Section 3, the corrected NDSMs are considered as features for a machine-learning-based workflow for land cover classification, whereby our main task will be to track the classification accuracy in critical regions. In Section 4, we summarize our findings and propose ideas for future work. * Corresponding author

Related Work
The generation of DTMs is a fundamental step in preparation of airborne data. It is being addressed regularly in the literature and even the surveys dedicated to discussion and evaluation of algorithms for ground surface filtering are being regularly published (Sithole, Vosselman, 2004, Meng et al., 2010, Chen et al., 2017. The reader may check these sources for an extensive background while the following short overview will only describe several frequently cited methods. Ground surface is characterized by a large amount of properties. However in challenging terrains, none of these properties alone is unconditionally valid and, therefore, many authors tried to find a reasonable combination of approaches modeling these properties. For example, (Kraus, Pfeifer, 1998) rely on the fact that the terrain is locally flat and low. They thus perform plane fitting weighting the points above the plane more than those lying below. Thus, terrain containing gullies and other kinds of uneven surfaces is expected to be challenging for this algorithm. Then, one can apply the algorithm of (Sithole, Vosselman, 2005) arguing that the points belonging to the same surface must have a flat connection. The measurement data is segmented in such a way that all heights of the points of a segment can be approximated by a plane. After grouping segments by properties and employing region-growing method, the corresponding surfaces are adjusted. If no segment does grow anymore, all small segments are eliminated. This algorithm has many variations (Lersch et al., 2004) and successors, to which we also count directional and slope-based approaches (Meng et al., 2009, Hingee et al., 2016, Mousa et al., 2017. Moreover, clustering, gives us a hint to the third criterion as described in (Sithole, Vosselman, 2005), namely that the off-terrain objects must not have too large area. This is used in the algorithms of (Wack, Wimmer, 2002) and (Bulatov et al., 2014). In the last algorithm, for example, the ground points were obtained by morphological filtering and approximated by a smooth surface, such as a solution of heat equation with a reasonable boundary condition (Gross et al., 2005), a cubic spline (Kilian et al., 1996, Bulatov, Lavery, 2010 or a thin-plate spline (Mongus,Žalik, 2012). Choice of the filter size is a problem and therefore, many authors use iterative filters with different parameters, see for example (Mongus,Žalik, 2012, Maguya et al., 2014). An older approach (Elmqvist et al., 2001) employs active contour models and use an energy function for simultaneously taking several aforementioned criteria for ground point extraction into account.
Finally, example-based, or supervised segmentation methods cannot be missed in a literature survey nowadays. Learning a model based on neural architecture reminding ConvNet with some 17 million labeled points, (Hu, Yuan, 2016) interpreted a neighborhood of a 3D point as an image with 128 × 128 pixels and extracted the ground points according to the output of the last fully-connected layer. This approach was improved by (Gevaert et al., 2018) who argued that in high-resolution data, such small images may not contain the largest non-terrain object and who proposed using atrous convolutions on large images instead of image down-sampling. They moreover mentioned that many easy training examples can be created by a rule-based approach. In the reality, while training data acquisition is costly, the problems may emerge in challenging spots. Therefore, in the next section, we will present our conventional approach for DTM computation and local correction, which will be evaluated on a challenging data set.

DTM CORRECTION
The knowledge about local elevations, synonymous with the difference between DSM and DTM, is a very important feature for land cover classification. After the derivation of the DSM (see Section 2.1), the initial DTM is computed removing non-terrain elements. In case of terrain areas with steep slopes (e.g. river embankments), problems may occur resulting in incorrect DTM height values. Two important tasks have to be solved: Detection of incorrect DTM areas and correction of the corresponding elevations. Faulty DTM areas are detected by searching for elevation values in the DTM that exhibit those of DSM. But, only looking for areas characterized by DSM < DT M also points out areas of tiny elevated terraces that are (task specific) not really in need of correction as it is the case e.g. at sidewalks. With our work, we do not want to distinguish between sidewalks and roads, but even the curbside fulfills DSM < DT M . To exclude these areas, a parameter for minimum distance is introduced.
The areas in need of correction, denoted as areas of steep slopes, are detected (Section 2.2.1) and the connected components are labeled, filtered, represented as point clouds, and interpolated using one of three procedures presented in Section 2.3. Finally, canonical triangulations constituting these fine meshes are fused with the original mesh, see Section 2.4. The reason to produce a mesh is that meshes are faster for computations within e.g. simulation applications. A Simulation system as e.g. Virtual Battlespace (VBS) enable the generation of terrain databases whereby the meshes could be exported to VBS (Häufel et al., 2017).

DSM and Initial DTM Generation
Starting with an airborne point cloud, we sample the point's elevations into 2.5D data. For generation of the DSM an equispacial grid of a fixed resolution is required. It has to be checked whether the point density of the ALS data fits the DSM grid's resolution (see Figure 1). Here are first considerations: Usually, in data sheets of a measurement campaign, information about the point density is available. The grid resolution should roughly contain 1 to 4 points within a grid cell. If it is not possible to reduce the grid's resolution but natural objects are mostly present in this area, interpolation of the point cloud will be sufficient. At first, an empty grid is generated matching the final data resolution. The ALS data is assigned to the grid and for each grid, the maximum and the mean elevation values are computed. In case of urban regions, using maximum values led to promising results. Empty cells or holes can be filled by solving Laplace's equation. In urban areas, non-terrain objects, such as buildings, trees or vehicles (elevated objects) are pictorially removed from the DSM raster, resulting in a DTM. This is done by using a filter whose size is determined in dependency of the maximum dimension of a non-terrain object which can be found in the study area. Usually, it is the smallest extension (length, width) of the largest building in the data. This filter is moved across the DSM raster in order to extract always the minimum height. We refer to those points that fulfill DSM filter = DSM as ground points and all other points are deleted. On a reduced resolution, we generate a surface by interpolation of the ground points. As usual in ill-posed problems, a regularization term penalizing the norm of the 2 nd derivative is needed. To take into account possible outliers and sudden changes of elevation, the optimization is carried out in the L1-norm over the set of all C 1 cubic splines (Bulatov, Lavery, 2010). In order to obtain the result on the original resolution, triangle interpolation with spline nodes as vertices is employed, yielding the so-called canonical triangle mesh. In our work, the DTM computation was completed using L1 spline and L2 spline interpolation and Gridfit method. These temporary DTMs are denoted as DTM 1 temp and separately denoted as DTM 1 L1 , DTM 1 L2 and DTM 1 GF . The results will be compared as introduced in the following sections.

Detection of Areas with Steep Slopes
A more careful look at the relative elevation (DTM − DSM) detects areas with questionable elevations, which have to be examined in more detail and to be corrected, if necessary. In Section 2.2.1 and Section 2.2.2, two approaches how to detect areas are described respectively, whereby in Section 2.2.1, DTMs DTM 1 temp are used. A further method will be introduced adding OSM data into the temporary DTM generation process described in more detail in Section 2.2.2. In total these temporary DTMs are denoted as DTM 2 temp and separately denoted as DTM 2 L1 , DTM 2 L2 and DTM 2 GF .

Detection of Incorrect Elevations
Unfortunately, by comparing the DSM to the resulting DTM it may happen that the DTM exhibits higher elevation values than the DSM. Some of these raster points form connected components denoted as suspicious areas S1,...,N . We assess each of these connected components S1,...,N by its size and the minimum permitted (average) deviation of DSM − DTM in order to find out which areas are supposed to be corrected. This effect (DSM > DTM) may occur in specific terrain formations with steep slopes which is also noted in e.g. (Wack, Wimmer, 2002). Such areas can be found at river courses with embankments, at entries of underpasses, at under crossings or at external cellar entries. In Figure 2, two examples of height profiles are shown, displaying a river profile on the left and a garage entrance on the right. Within these areas, raster points of unreliable elevation values are marked and a subsequent generation of the DSM − DTM would exhibit wrong elevation values, usually terrain points should show elevations near zero (see Figure 3). Thus, in Figure  Considering the river area, the L1 spline interpolation, L2 spline interpolation as well as the Gridfit method showed similar results for the river profiles. The least negative elevation values were achieved using Gridfit method. In case of bridge areas, L1 spline interpolation resulted in lowest elevations, starting from 1.9 m to the maximum of 2.1 m. Slightly worse results were achieved using the Gridfit method.
The embankment is part of the terrain and the detected S1,...,N will contain parts of the river and parts of its surrounding. The difference between DSM and DTM (allowing a small deviation) exhibits indications for invalid DTM parts. To proceed, a binary mask for steep slopes T is generated, highlighting those areas: Here εT is the average negative height of all points fulfilling DSM − DTM 1 temp < 0. This choice turned out to be useful to avoid too many alarms as they would occur in regions of e.g. sidewalks that we do not distinguish from roads. Nevertheless εT should be chosen carefully depending on the requirements of detail. Areas exceeding a minimum size, are labeled and the corresponding raster points of the DSM are used to derive meshes using L1 interpolation with a much finer resolution which will be described in Section 2.3. In Figure 4, the results of interpolation of T with the L1 splines, L2 splines and the Gridfit method, respectively, are displayed. The masks T exhibit, beside the river, small components which are entrances of underground parkings or cellars or in their majority sudden elevation changes in the street course. The masks T for generating the three (corresponding) DTMs exhibit less areas that have to be corrected for the L1 spline than for the L2 spline and the Gridfit method. In case of the Gridfit method, beside the river area, more smallsized areas are detected which partially will be removed (see Section 2.3).

Detection of Areas Using OSM data
The focus of this work is the detection of areas with steep slopes. If no indication for a steep area can be identified or if there are considerations how to improve the initial DTM, additional information has to be added. We opt for freely available GIS data and here, OpenStreetMap data was included. If an object type from OSM data is already outlined by a polygon, this can directly be used for rasterization and mask generation. Otherwise the object types are represented by a polygonal chain, like roads or waterways. In these cases, we assume a plausible width for these objects and therewith generate a polygon along the polygonal chain. This plausible width is derived from the OSM road attribute concerning the road's type. In Figure 5 on the left side, already rasterized OSM roads and river are superimposed. In this case the river was outlined by a polygon, however, on the left side, the river is still continuing (see Figure 5, left, blue-striped area), because it was not included in OSM waterways. The process of DTM correction, ignoring raster points confirmed by OSM information, will be described using this The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) large river polygon as an example. Bridges, part of roads, and over crossing the river (part of R) are removed, resulting in the mask W¬R.
In the first step, all raster points of the DSM not overlapping W¬R remain stable for DTM computation. The remaining raster points, confirmed by W¬R, are set to zero. They are denoted as unreliable elevations.
On the top left side of Figure 6, a detailed view of local elevations is displayed including two lines across the river and along the bridge for profile generation to compare the results achieved with L1 spline, L2 spline interpolation as well as Gridfit method. Already with the usage of the OSM data ( Figure 6, top right), the local elevations (before correction) are improved around the bridge area. Highlighting the result achieved by L1 spline interpolation shows a maximum value of about 1.1 m, in contrary to results achieved by Gridfit method where the maximum value is approximately 2.2 m. In the river area, less improvements could be seen compared to the results achieved without using OSM data.
Considering the surface model and the OSM affected DTMs for the L1 spline interpolation, L2 spline interpolation as well as the Gridfit method, those raster points fulfilling Equation 1 are denoted as possible candidates for mesh surface interpolation (Section 2.3). Most suspicious area candidates could be detected in the DTM using L1 spline interpolation (Figure 7, left). In case of the L2 spline interpolation, less areas are detected. One possible explanation is that the L2 spline interpolation exhibits Gibbs artifacts and produces oscillating surfaces so that most of the DSM points show larger values than the DTM.

Mesh Surface Interpolation
In this section, all connected areas confirmed by the binary mask T (Section 2.2) are considered separately for mesh surface approximation. For mesh surface interpolation, we use on the one hand, raster information and, on the other hand, a triangular mesh of the elevation models. Firstly, the temporary elevation model EM (raster data) is generated representing the DSM without non-elevated regions (Bulatov et al., 2016): The parameter δ is considered to be a small value near to zero and for this work δ is set to 0.2 m. In Figure 8, a part of the DSM and a line used to generate a profile are displayed. The diagram on the right side visualizes the procedure of generating the elevation model denoted as EM (red-dashed line). The redgreen dashed line contains DTM components, and the red-blue dashed line contains parts of the DSM. Only areas of S1,...,N (see Equation 1) exceeding a minimum size (5 sqm) are used for mesh interpolation denoted as T Label . In Figure 9, the elevation model EM (left) and connected areas T Label (right) are displayed. Considering each area of T Label , we determine the bounding boxes of all vertices belonging to the triangles of the mesh representation.
In order to collect all necessary triangles, the bounding box for each connected component is slightly enlarged by 2.5 m in each direction, and the corresponding area is cut off the raster elevation model. Simultaneously, the corresponding triangles of the DTM mesh are determined and denoted as triangles to be removed. Now, one has to consider more carefully the cut-off raster points, which shall be used for mesh surface interpolation. This cut-off raster may exhibit high elevated small components (e.g. bushes or stone heaps) that mainly belong to objects with low planarity values. The planarity feature is computed by means of (Gross, Thoennessen, 2006), and allows the distinction between trees (typically low planarity values) and buildings (high planarity values). Raster points that exhibit low planarity values could distort the interpolation process and are therefore removed. The remaining raster points are used for L1 spline interpolation resulting in spline nodes (Xi, Yi, Zi). The spline nodes are used for mesh surface approximation (canonical meshing) resulting in M Label . So, for every connected component which is de- Figure 9. Elevation model EM (left), labeled areas for mesh computation (right).
termined by T Label , M Label defines the labeled meshes.
These meshes M Label have to be inserted into the holey DTM mesh in order to create a waterproofed mesh. Now, one has to merge the meshes M Label in the areas around overlaps or gaps with the initial DTM mesh. The procedure for obtaining gap-free surfaces is described in Section 2.4.

Mesh Fusion
For each mesh M Label , its vertices (point cloud) are used for fusion process with the mesh proceeding from the initial DTM. In Figure 10, the process of fusion is displayed. In the first step, all DTM triangles representing areas to be corrected, are removed resulting in a DTM (green triangles) exhibiting holes (white areas). This new holey DTM is completed by inserting all meshes M Label considering indexing of merged vertices and corresponding triangles. In Figure 10, the new DTM exhibiting holes and the newly inserted meshes (bottom row, black) and a detailed view (top right, light-red) are displayed. This process is completed after all meshes M Label have been inserted. Based on all vertices and deleting duplicate vertices,  Delaunay Triangulation is applied producing a waterproofed mesh (Lenk, Heipke, 2006). In Figure 11, the final DTM based on OSM data and L1 spline interpolation is displayed. This final DTM is used to derive local elevations for the NDSM. To evaluate the results, we perform classification and focus especially on areas with steep slopes.

RESULTS
The chosen region of our research is located in Southern Germany and exhibits urban, rural and industrial areas. We selected the town Ettlingen which is characterized by buildings of different height and shapes, small parks, roads and a river flowing through the city center of Ettlingen. At corners of some buildings one can find out-of-building stairways or entrances of garages. Embankments border the river. All of these characteristics exhibit steep slopes which may turned out to be often misor non-classified regions. The steep slope is mis-interpreted by the classification algorithm as rise of an elevated object, as it is the case for e.g. building walls. Therewith, adjoining objects to slopes or the slopes itselves are wrongly classified. A terrain computation which is based on surface elevation may lead at these specific terrain conditions to faulty elevation values, which can influence the final step of land cover classification.

Experimental Setup for Land Cover Classification
To demonstrate the usability of our procedure, we use a regionbased classification using a segmentation result and OSM data described in (Häufel et al., 2018). In this classification process, only few features, namely color information, local elevation, NDVI, planarity and entropy were used. In an intermediate step towards classification the obtained segmentation results (Wassenberg et al., 2009) are fused with the mentioned features. Following, OSM data (Häufel et al., 2018) overlapping with segments and verified with the derived features produces an initial classification result (see Figure 12 (bottom left)). Due to this process, some of the segments may remain unlabeled due to the fact, that no OSM data overlaps the segments classification. Using this labeled pixels as reference data, we chose the white-transparent square in Figure 12 (bottom left) for training of a Random Forest (Breiman, 1996) with 20 decision trees to perform comprehensive classification all over the image. In other words, all labeled pixels were used for validation. The corresponding confusion matrix is displayed in Table 1. For this initial result an overall accuracy (OA) of 88.84% and κ value (Cohen's Kappa) of 80.42 % could be achieved. We wish to compare these values with those obtained after correcting the NDSM (see Section 3.2). To present a setting for qualitative evaluation (Section 3.3), we show in Figure 12 the orthophoto of the study area and the uncorrected NDSM, Figure 12, top left and right). The reference data as result of (Häufel et al.,

Quantitative Evaluation
With our workflow, we are able to correct the local elevations. The variants using OSM and without OSM combined with different DTM computations using L1 spline and L2 spline interpolation and Gridfit method led to six different terrain models listed in Section 2.2. After correcting NDSM and applying Random Forest, overall accuracies using the variants DTM 1 temp and DTM 2 temp for NDSM computation were in the same range as the initial result. Considering all results, better overall accuracies were achieved using OSM data support than by not using OSM data. Thus, in Tables 2 to 4, the confusion matrices for NDSM 2 L1 , NDSM 2 L2 and NDSM 2 GF are shown. Remarkable mis-classifications occurred between road class and grass class. Besides, trees were incorrectly classified as buildings which can be explained by relative smooth surfaces leading to high planarity values. In total, there are slight deviations between the covariance matrices, the κ values for tables show values ≥ 80% indicating a high agreement.

Qualitative Evaluation
The previous section has not yet provided an ultimate proof of benefit from correction of NDSM because changes occurred only locally. Thus, only a limited number of pixels has changed their class. Therefore, we wish to look in more detail at the most The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) (strategically) important areas of our dataset. We chose two smaller dataset subareas around bridges and a third area where all classes appear in similar number of pixels. We denote this three regions as subareas I, II and III. They are shown in Figure  13 and for readers' orientation about their approximate position in Figure 12. In Figure 13, we see that because the initial DSM was not flexible enough, specific areas exhibiting strong slopes were not determined as elevated objects, are still existent. Hence, the bridges, which are actually part of the terrain, were mostly classified as buildings. The results of classification achieved after correction of NDSM are displayed in Figure 14.
In rows 1, 3 and 5 of Figure 14, from left to right, the classification results based on DTM generation DTM 1 L1 , DTM 1 L2 and DTM 1 GF are shown. In rows 2, 4 and 6 the classification results based on DTM generation using OSM data DTM 2 L1 , DTM 2 L2 and DTM 2 GF are displayed. Looking at the bridge of area I, we can see that to different extents, land cover classification in this area could be improved compared to the result without correction of DTM since the bridge was supposed to be assigned to the road class. Better results were achieved by those DTM variants which were affected during initial DTM computation by OSM data (see Figure 14, row: 2). The best result was achieved by OSM data-influenced DTM generation using DTM 2 L1 , the bridge was correctly classified belonging to the road's class and some parts of the planted median strip were correctly classified as low vegetation. For DTM variants DTM 2 L2 and DTM 2 GF , small bridge components were misclassified as parts of buildings. These little inaccuracies can be easily explained since they occur due to vehicles on the bridge. The classification of vehicles as buildings is the nearest similar hit since such entities are not part of our classification categories. Comparing these results to the initial result, the bridge was almost completely misclassified as belonging to the building class. In Figure 14 (rows: 3 and 4), results achieved by DTM generation influenced by OSM data show a more accurate classification on the bridge (black-dotted circle) than those variants without using OSM data. Comparison between the best result achieved with DTM 2 L1 and the initial result (see Figure 13) show, that the bridge is now correctly classified as part of the road class. Misclassified building spots belong to street lamps and could be deleted because they exhibits atypical size. Considering flat terrain (see Figure 14, rows: 5 and 6) affected by used land cover classes and comparing it with the initial result, the best result could be achieved by NDSM generation using DTM 2 L1 . Looking at the small lawn (black-dotted circle), the best results could be achieved by the DTM generation variants based on L1 spline interpolation. Altogether it can be mentioned that the best results could be achieved by including OSM data to the process of DTM generation and using L1 spline.

CONCLUSIONS AND FUTURE WORK
In this work, we presented an approach to correct a digital terrain model in critical regions. These areas in general exhibit steep slopes, which may have negative repercussions while computing DTM and successively performing land cover classification with NDSM as feature. Starting with a DSM, at first, a temporary DTM was computed, and areas, which had to be corrected could be processed individually as meshes which were fused into the temporary DTM at corresponding positions. These corrected local elevations were used for the classification process. Since these areas are too small to have large impact on a whole data set, the overall accuracies could not be improved signific- Figure 14. Detailed views of classification results for Areas I, II, III: DTM generation (rows: 1, 3, 5) using (left to right): L1 spline (1), L2 spline (2), Gridfit method (3); DTM generation with OSM data (rows: 2, 4, 6) using (left to right): L1 spline (4), L2 spline (5), Gridfit method (6) antly. However, we selected three strategically important areas and after a visual inspection, specific areas exhibited correct land cover class assignments. For future work, in order to be able to obtain more valid results regarding the correction of NDSMs at specific areas, more test sites have to be inspected. These test sites should be comparable to the study area, meaning it should be characterized by urban components, bridges traversing a river or a valley, or further comparable structures.