MORPHOLOGICAL FILLING OF DIGITAL ELEVATION MODELS

In this paper a new approach for a more detailed post processing and filling of digital elevation models (DEMs) in urban areas is presented. To reach the required specifications in a first step the errors in digital surface models (DSMs) generated by dense stereo algorithms are analyzed and methods for detection and classification of the different types of errors are implemented. Subsequently the classified erroneous areas are handled in separate manner to eliminate outliers and fill the DSM properly. The errors which can be detected in DSMs range from outliers – single pixels or small areas containing extremely high or low values – over noise from mismatches, single small holes to occlusions, where large areas are not visible in one of the images of the stereo pair. To validate the presented method artificial DSMs are generated and superimposed with all different kinds of described errors like noise (small holes cut in), outliers (small areas moved up/down), occlusions (larger areas beneath steep walls) and so on. The method is subsequently applied to the artificial DSMs and the resulting filled DSMs are compared to the original artificial DSMs without the introduced errors. Also the method is applied to stereo satellite generated DSMs from the ISPRS Comission 1 WG4 benchmark dataset and the results are checked with the also provided first pulse laser DSM data. Finally the results are discussed, strengths and weaknesses of the approach are shown and suggestions for application and optimization are given.


INTRODUCTION
Digital elevation models (DEMs) are often the base for applications needing geospatial information.More and more also very high resolution DEMs in urban areas are needed for planning, monitoring and visualization.Beneath the acquisition of DEMswhich are mostly digital surface models (DSMs) -via laser scanning also the DSM generation from very high resolution (VHR) satellite stereo imagery becomes increasingly important, especially in remote areas like developing countries.
Also new methods for the generation of dense high resolution DSMs from such imagery are becoming more mature and find their way from research to standard image processing software.These methods like graph based algorithms as maximum-flow/minimum-cut or the DLR developed semi global matching (SGM) generate very dense DSMs from epipolar imagery in the same order of resolution as the input imagery (dAngelo et al., 2008).But these DSMs sometimes contain many errors, outliers, noise and holes, especially if just two viewing directions are available.To eliminate these errors and outliers the DSMs normally undergo some post processing steps in which the outliers are removed and holes get filled.
These post processing steps yield normally DSMs of a reduced resolution of about 1/3 to 1/10 of the original resolution.But especially in urban areas and using satellites with ground sampling distances of about one meter an additional reduction of the resolution of the generated DSMs to about three to ten meters provide the user with DSMs too coarse for the possibility of detecting objects of smaller size.Therefore an approach is needed for a more sophisticated post processing of the DSMs to preserve the resolution of the original imagery also for the resulting DSM.
In the following we propose an approach for classification and distinct handling of different error types in DSMs.The modus operandi will be: • analyzing the different types of errors • classifying these errors • deriving methods for different handling of errors • applying the methods to artificial imagery • evaluating the results against the original, error free data • applying the methods to the Terrassa benchmark data • evaluating the results against existing ground truth

Preliminary work
Almost much work was done in post-processing of DSMs.Since the best quality DSMs are generated by laser scanning most of the approaches start with such LIDAR point clouds (see for an exhaustive literature list Gruen and Akca (2005)).But the errors in laser scanning data are nearly neglectible compared to errors originating from DSM generation methods operating on satellite stereo imagery.Such very high resolution (VHR) stereo images come with ground sampling distances (GSD) for the PAN channel in the range from 0.5 m for WorldView I and II or GeoEye over the older QuickBird with 0.6 m to Ikonos with 0.8 to 1 m.The multispectral channels (blue, green, red and near infrared) possess four time the GSD, except for WorldView I which only has a PAN channel.Additionally WorldView II has four more multispectral channels: coastal (shorter wavelength than blue), yellow (between green and red), red-edge (between red and near infrared) and an additionally second near infrared channel extending to longer wavelengths (WorldView II, 2011).
The first step is always the derivation of the digital surface model from the satellite stereo imagery.One of the best methods proofed in the last years for this job is the "semi global matching".This method (Hirschmüller, 2005) is a so called "dense stereo matching" method in contrast to the classical methods.
Classical methods as implemented still in most commercial software packages search characteristic (interest) points in both images by image correlation for creation of a sparse object point cloud by forward intersection using the sensor model (Lehner and Gill, 1992).Different region growing methods (Otto and Chau, In contrast the dense stereo algorithms originating from computer vision try to correlate each point in the stereo pair (Scharstein and Szeliski, 2002;Hirschmüller, 2005;Krauß et al., 2005;dAngelo et al., 2008).For this the images first have to be transformed to epipolar images for correlating these line by line and pixel by pixel.The result is a disparity map fitting exactly on one of the epipolar images containing the distance of each pixel to its companion pixel in the other stereo image in epipolar direction.
A DSM generated with the classical method is so a relatively sparse point cloud whereas a DSM generated using a dense stereo algorithm has only a few holes but also blunders and other types of errors where the dense correlation fails.The generated DSMs from such VHR satellite stereo images deliver so either a rather good DSM of relative coarse resolution of about 1/3 to 1/10 of the original ground sampling distance (Lehner et al., 2007) or a high resolution DSM with many mismatches and blunders (Xu et al., 2008).
Methods for refining and correcting the DSM data are often based for example on statistical approaches (Vincent, 1993) or on DSM segmentation and extraction of rectangular buildings (Arefi et al., 2009).Haala et al. (1998) proposed a method reconstructing building rooftops using surface normals extracted from DSM data.They assumed that building boundaries are detected previously.But most of these methods work only with high resolution and reliable LIDAR DSMs which in general do not suffer from noise, outliers and smoothing effects like stereographic generated DSMs (Schickler and Thorpe, 2001).Also the reconstruction of 3D building structures by hierarchical fitting of minimum boundary rectangles (MBR) and RANSAC based algorithms for line or surface reconstruction are quite common (Arefi et al., 2008).An other approach fusioning the original imagery with the intermediate generated disparity map of dense stereo methods was proposed by Krauß and Reinartz (2010).
In this paper we propose a much simpler approach by classifying different types of typical DSM errors introduced by dense stereo methods and handling these errors individually.We rely on the stereo method for already detecting erroneous areas and marking them as background for example in occcluded areas or for removing outliers.1. "warp errors" are small, normally only one pixel holes originating from small scale differences in the two stereo images which lead to errors in forward-bacckward-correlation 2. "occlusions" occur at areas which are not visible simultaneously in both images of the stereo pair, typically at steep edges, narrow court yards or streets.The term occlusion will be used below in the sense of missing DSM values near steep walls surrounded mostly by low DSM values but only near the wall with higher DSM values 3. "court yards" are also a type of occlusion, but are surrounded on all sides with high DSM values not corresponding to the real height of the yard 4. "narrow roads" are a third type of occlusion, surrounded mostly by high DSM values but also single lower values at connecting roads or places 5. "too homogeneous surface" also results in forward-backwardmatching errors due to non existing "glue dots" for correct correlation, these areas are hardly to ditinguish from court yards in the DSM 6. "moving traffic" between the two images of the stereo pair causes also mismatches and typical holes on streets 7. "forest" looks mostly very different if the stereo pair is acquired with large stereo angles (unfortunately the standard configuration for most satellite operators) and leads also often to holes in the DSM For eliminating these errors we propose the following approaches: 1. "warp errors": detect small holes and fill simply by taking the median of the surrounding DSM 2. "occlusions": may be filled by the ground value nearby 3. "court yards": should also be filled by a ground value of this area 4. "narrow roads": should also gain ground values connecting correctly at the ends to other roads 5. "too homogeneous surface": should be interpolated from the boundaries 6. "moving traffic": can be replaced by the surrounding road values, also interpolation from the boundaries 7. "forest": may be handled best by simple interpolation since too much mismatches from the crown areas mix with real occlusions Following this scheme we have to detect following classes for different handling: • small holes about 1-2 pixels: median filling from neighbours • court yards, streets, occlusions: filling from ground • homogeneous areas, traffic, forest: interpolating The classification of the first one will be no problem.But separating for example homogeneous areas on roofs (5) from real court yards (3) may become very tricky.

Digital terrain model -DTM
Since we want to fill some erroneous areas from the ground like court yards or occlusions we need first to extract a digital terrain model (DTM) representing the ground without any elevated objects like buildings or trees.For this we try three different methods as described in Krauß et al. (2011): • The Classical morphological approach (Weidner and Förstner, 1995) • Geodesic dilation (Arefi et al., 2007) • Steep edge detection (Krauß and Reinartz, 2010) For synthetic urban DSMs the third method performs best, but for real DSMs like the benchmark data, the first two algorithms are more suitable.So a fusion of the three methods as proposed in Krauß et al. (2011) will be used for generating the DTMs and subsequent filling the detected ground areas.

Generation of synthetic DSMs
For the generation of synthetical DSMs we need a special method for joining a DTM (the ground) with some objects like buildings (represented by a so called normalized digital elevation model, nDEM) to the DSM we have to analyze with our proposed method.The objects i are now put on these locally flattened DT Mi parts, to generate the correct DSM.

Generating other DSM errors
Other errors like warp errors are additionally added to the occluded DSMs using randomly distributed background values.Fig. 6 shows occluded images with increasing hole probabilities.Holes are set to "no value" (background) if a random variable rnd for a pixel holds rnd > p h with the hole probability p h (all values in percent ranging from 0 to 100).

Classification of Errors
The typical errors are classified as followed: • "small holes" with sizes of about 1-2 pixels are masked by generating a mask of non-background, do a morphological closing on this mask using a structuring element of radius one and afterwards subtracting the original non-background mask from the one-pixel-closed mask • "occlusions" will be classified as larger background areas in the DSM containing border values with larger height variances, typically larger lower and higher parts • "homogeneous areas" on roof-tops may be found in larger background areas with small variances in the height of border pixels • "small court yards" and "small streets" may not have any connection on the border to any lower DSM area.So they will appear like the previous homogeneous areas.Such areas may only be detected using also the ortho image and looking on the homogeneity of the occluded/homogeneous area.• "traffic": "homogeneous areas" and "court yards" show small height variances on the border heights but all well above any ground whereas ground-stuck objects like traffic will have also small height variances on the border but at ground level • "forest" and "homogeneous grass land" may be detected also using the multispectral ortho image by calculating the N DV I (normalized difference vegetation index, defined as (n − r)/(n + r) using the red and near-infrared channels r and n).A N DV I > tv with tv ≈ 0.1 shows vivid vegetation.
• "water" may be detected addionally by also using the N DV I but with N DV I < −tw with tw ≈ 0.1 The last two classifications will work neither with the synthetic generated DSMs nor with the Terrassa test data from the ISPRS benchmarking dataset since for both no multispectral imagery was available.

Handling of Errors
These classified areas should be handled in different manners: • "small holes" will simply get filled by applying a median with a radius of 1 and filling only the masked pixels • "occlusions" may be filled by the lowest values of the surrounding border values • "homogeneous areas" get simply interpolated from the border using the inverse distance weighted interpolation (IDW) • "small court yards" and "small streets" in contrast should be filled from the derived DTM • "traffic" will also be simply filled from the surrounding DTM • "forest" and "homogeneous grass land" will be interpolated using IDW • "water" should be handled first -also before deriving the final DTM.Water areas should be filled on the lowest surrounding DSM level.But here also has some more investigation on rivers with slopes in flow direction been done.Since our images show no water areas this topic will be delayed to some later works.
Before filling areas with ground (DTM) values the DTM has to be extracted.But for applying the DTM extraction methods already a filled DSM will be needed.For this purpose we choose to fill the small holes as described above and the large holes using the lowest value of the border height.As can be seen in fig. 9 the method breaks above about 60 % hole probability due to too large connected erroneous areas.But the mean offset of the filled DSM compared to the reference DSM stays nearly at zero with an increasing standard deviation at increasing error sizes.The mean offset of the simple fill as shown in fig. 10 keeps always high with high error but the method delivers in contrast to the proposed method also reasonable results with extremely sparse DSMs.This is due to the simple interpolation of occlusions which should be at ground level but are raising from ground to the height of the building in the interpolated DSM.

APPLICATION TO TEST DATA
The test data consists of a WorldView-I stereo image pair of the test area "Terrassa" from the ISPRS-benchmarking dataset of Commission I, working group 4 (http://www.commission1.isprs.org/wg4/) as shown in fig.11.As can be seen is the mean in contrast to the experiments using synthetical DSMs generally significantly below zero.The calculated DSM f illed − DSM ref means that the filled DSM is below the reference laser points.In contrast the statistics for the simple IDW filled DSM in fig.17 shows much better mean values around zero.But in this case the higher standard deviation and also much more outliers for the minimum and maximum deviations can be found.Here can be seen, that in the bottom wing of the building a large part of the southern edge of the building was missing in the DSM.So the detected occlusion is only half a real occlusion but also half occupying a missing part of the building.Simple interpolation is therefore more appropriate to fill half building and half ground (which delivers in best case a mean of zero if both parts of the occlusion are of the same size) than only filling with ground values (which gives for half of the occlusion a too low value and therefore a negative mean).Such kinds of errors should be modeled and handled additionally in future to gain better results.
But compared over all the proposed method also works better for the test site data in most regions.

DISCUSSION
Filling missing parts in DSMs generated from very high resolution satellite stereo imagery should be done depending on the different types of errors.For this the errors in the generated DSM has to be classified and handled individually.Using individual handling of different classes of errors yield significantly better results compared to a given reference than only interpolating the DSMs.
Also the approach simulating DSMs and afterward incorporating them with different types of errors is a convincing approach.In this case always a perfect reference remains and it can be tried to restore the error loaded DSM using different methods.Comparing the results to the original synthetical DSM varying the simulated errors will give more insight to the behaviour of the algorithms implemented.But the method used for simulating errors in the perfect synthetical DSM has to be choosen carefully and also the types of errors which may occur have to be understood and analyzed previously in detail.
One of the main challenges is the detection of court yards with no detected ground points inside.Such a situation can be seen in fig. 1 e.g. between numbers 2 and 4, but fortunately not occurs in the test datasets.These areas should be filled from the DTM with values outside of the building whereas homogeneous areas or roof structures should only be interpolated.
Court yards show in general much more texture than homogeneous roof structures.These two can be discriminated by using some kind of structure parameter like a thresholded standard deviation of the ortho image inside the occlusion.But more often in real DSMs also strongly textured roof structures may lead to occlusions which are nearly not distinguishable from court yard structures.
One possible solution may be the usage of the occlusion algorithm described in 2.4 for the generation of synthetic DSMs.This algorithm can also be used with the sun azimuth and elevation to generate synthetic shadows.In an iterative method occluded areas may be segmented to shadowed and non shadowed parts and shadows for these segments simulated using different heights of the occluded segments.By afterward comparing the simulated shadow with the shadow of the ortho image the correct height may be extracted.But if the ortho image was generated using a wrong DSM there will be distortions that will also lead to wrong results.

CONCLUSION AND OUTLOOK
Digital surface models (DSMs) generated from very high resolution (VHR) satellite stereo imagery with ground sampling distances (GSD) of about one meter using dense stereo methods like the semiglobal matching (SGM) contain different classes of errors.In the presented approach these errors are analyzed and classified.Afterwards different classes of errors are handled in special manners to generate a "best of" filled DSM.Using synthetical generated DSMs with a broad range of different errors allow the analysis of the performance of the algorithm.These analysis shows that the method is only usable for relatively dense DSMs with noise generated holes below 60 % of the area and single larger occluded areas.
In conclusion we can state that a detection and classification of different types of errors in DSMs and the individual handling of each type of error gains much better results than only using one method for filling DSMs.
In the future the classification of court yards to fill from DTM areas outside the building and the detection of small roads has to be implemented and also better approaches for the generation of DTMs from noisy DSMs has to be analyzed.

1. 2
Fig. 1 shows typical errors occuring in DSMs generated by optical dense stereo methods like SGM (semi global matching).The figure contains a DSM of a small urban part near the Akropolis in Athens on the left side and the corresponding ortho image on the right side.The DSM generation was based on an Ikonos stereo pair (Summer 2004) with a ground sampling distance of one meter.The typical errors (black areas in the DSM) can be classified as follows: Normally for the nDEM simply holds nDEM = DSM −DT M which does not work for DTMs changing significantly over the size of a building.If we use the inverse DSM = DT M + nDEM to generate the synthetical DSMs we will gain houses with roofs reflecting the ground variations (fig.2, left).

Figure 2 :
Figure 2: Synthetically generated DSM composed from urban blocks (nDEM) and a rolling ground (DTM) -left: simple DTM+nDEM, right: correctly added nDEM to min(DTM) in each building area Figure 3: Synthetical DSMs, top: 3D view, bottom DSM; left: rectangular buildings (32 m), center: industry buildings (112 m), right: urban blocks (80 m × 112 m, each block 16 m); all streets between buildings 32 m In fig. 3 the nDEMs combined with the DTM using the method described in 2.2 are shown.

Figure 7 :
Figure 7: Occluded urban DSM with 40 % holes (left) and hole mask From the synthetic DSMs shown in fig. 3 in a first step different levels of occluded and error containing versions like the examples in fig.6 are generated (see fig. 7).Afterwards the DSM correction algorithm described in 3 -not using the NDVI part due to missing multispectral imagery -and a standard inverse distance weighted interpolation (IDW) is applied as shown in fig.8.

Figure 8 :
Figure 8: Filled DSM with described approach (left), reference (center), and DSM filled with inverse distance weighted interpolation (right)The resulting filled DSMs are compared to the original DSMs from fig.8, center -checking only the values in the erroneous areas (using the mask generated directlly from the DSM in fig.7).

Figure 9 :
Figure 9: Statistics of errors using described approach on synthetical urban DSMs, occluded (elevation 60 • ) with increasing hole probability from 0 % to 85 %, minimum (green), maximum (blue) deviation and mean and standard deviation (red) to original reference DSM The results plotted against the hole probability of the input DSM are shown in fig. 9 for the proposed method and in fig. 10 for the inverse distance weighted interpolation.The difference is calculated as DSM calc −DSM ref .So values greater than zero denote areas in which the filled DSM is higher than the reference.

Figure 10 :
Figure 10: Statistics of errors using inverse distance weighted interpolation on synthetical urban DSMs, occluded (elevation 60 • ) with increasing hole probability from 0 % to 85 %, minimum (green), maximum (blue) deviation and mean and standard deviation (red) to original reference DSM

Figure 11 :
Figure 11: Overview on the Terrassa testsite, 4 km × 4 km, center at 2 • 1.99'E, 41 • 34.64'N, left DSM (ranging from 210 m to 390 m), right ortho image Fig. 12 shows a 3D view of the area (north to the top right) and a 1 km × 1 km detail view starting at top, 1 km from the left of fig.11.

Figure 12 :
Figure 12: Overview on the Terrassa testsite, 3D views, left full 4 km × 4 km, right detail 1 km × 1 km Fig.13shows the ortho image and the reference laser DSM from a 1 km × 1 km detail of the benchmarking test site.Fig.14shows the DSM and the occlusion mask whereas fig.15shows the filled DSMs using the proposed method and a simple inverse distance weighted interpolation respectively.

Figure 13 :
Figure 13: Detail of Terrassa testsite 1 km × 1 km, left ortho image, right reference laser DSM points For the ISPRS benchmarking test also a reference laser DSM was provided by the ICC (Institut Cartogràfic de Catalunya, http:// www.icc.cat/) which can be used for evaluation of the filling of the DSMs.First the DSM generated by the semi global matching algorithm is filtered for the small holes as described above and the remaining large holes are masked as shown in fig.14.

Figure 14 :
Figure 14: Detail of Terrassa testsite 1 km × 1 km, left DSM with small holes filled, right occlusion mask These remaining large holes are subsequently classified to the classes given in 3.1.The classified erroneous areas are then filled following the steps described in 3.2 without the classifications needing the NDVI.This restriction is necessary since the base imagery used for the DSM generation are WorldView-I stereo data with only one PAN channel and no additional multispectral information.The DSM filled following these method is shown in fig.15, left, whereas a simple inverse distance weighted interpolation is shown in the same figure on the right hand side.

Figure 15 :
Figure 15: Detail of Terrassa testsite 1 km × 1 km, left DSM filled with proposed method, right DSM filled using inverse distance weighted interpolation Figs.16 and 17 show the variation of the same statistical parameters as used already with the synthetical DSMs in fig. 9 but for different areas of the whole Terrassa test area.Since the test area contains much urban structured areas in the north continually changing to completely mountainous forestry area in the south (cf.fig.11) the statistics is calculated for horizontal strips

Figure 16 :
Figure 16: Statistics of whole Terrassa area from north (left) to south (right) for proposed methodThe statistics shown in fig.16shows the mean, standard deviation and the maximum and minimum deviations of the filled DSM vs. the reference DSM from north (left) to south (right).As can be seen is the mean in contrast to the experiments using synthetical DSMs generally significantly below zero.The calculated DSM f illed − DSM ref means that the filled DSM is below the reference laser points.In contrast the statistics for the simple IDW filled DSM in fig.17shows much better mean values around zero.But in this case the higher standard deviation and also much more outliers for the minimum and maximum deviations can be found.

Figure 17 :
Figure 17: Statistics of whole Terrassa area from north (left) to south (right) for idw In the left part of fig.16 (top edge of the DSM until about 1.2 km from the top) there exist outliers with about 20 m too high and about 60 m too low filled DSM.This and the much worse mean values in comparison to the synthetical DSMs can be explained with an additional type of DSM error which was not modeled as shown in fig.18.

Figure 18 :
Figure 18: Detail of filled DSM (left) and IDW interpolated DSM (right) Fig. 18 shows a detail of the filled DSM (left) and IDW interpolated DSM (right) of a building (values on left and top give the