EXPLOITATION OF DIGITAL SURFACE MODELS GENERATED FROM WORLDVIEW-2 DATA FOR SAR SIMULATION TECHNIQUES

GeoRaySAR, an automated SAR simulator developed at DLR, identifies buildings in high resolution SAR data by utilizing geometric knowledge extracted from digital surface models (DSMs). Hitherto, the simulator has utilized DSMs generated from LiDAR data from airborne sensors with pre-filtered vegetation. Discarding the need for pre-optimized model input, DSMs generated from high resolution optical data (acquired with WorldView-2) are used for the extraction of building-related SAR image parts in this work. An automatic preprocessing of the DSMs has been developed for separating buildings from elevated vegetation (trees, bushes) and reducing the noise level. Based on that, automated simulations are triggered considering the properties of real SAR images. Locations in three cities, Munich, London and Istanbul, were chosen as study areas to determine advantages and limitations related to WorldView-2 DSMs as input for GeoRaySAR. Beyond, the impact of the quality of the DSM in terms of building extraction is evaluated as well as evaluation of building DSM, a DSM only containing buildings. The results indicate that building extents can be detected with DSMs from optical satellite data with various success, dependent on the quality of the DSM as well as on the SAR imaging perspective.


INTRODUCTION
One of the strengths of using SAR images as source of information is related to near-real time applications in the context of unexpected events, e.g.earthquake, due to by the independence of weather conditions and the time of the day.However, the interpretation of scenes covering urban areas acquired with SAR is often a challenging task due to geometric distortion effects pertinent to the imaging concept.
Various simulators have been developed to ease the interpretation of SAR images of urban areas, e.g. by taking into account the electromagnetic and geometrical properties of buildings (Guida et al., 2008) or by utilizing ray tracing (Hammer and Schulz, 2011).GeoRaySAR is a simulator of the latter type, being developed at DLR, which enables the identification of buildings in high resolution SAR data.To this end, prior knowledge about the scene geometry has to be extracted for the automated prediction of building extents.The knowledge can be acquired from either 3D GIS models (Auer and Donaubauer, 2015) or from DSMs (Tao et al., 2014).In (Tao et al., 2014) has prior knowledge been derived from DSMs based on LiDAR data (airborne sensor) that only contained man-made structures (i.e.vegetation had been pre-filtered).However, more realistic scenarios would expect DSMs on the basis of satellite data without pre-filtered vegetation (e.g. with support of cadastre information).
As a part of GeoRaySAR, geometric knowledge is extracted from a DSM, which is decomposed to add a digital terrain model (DTM) and a normalized DSM (nDSM) including elevated scene objects.Using the height models, GeoRaySAR simulates separate layers representing different signal reflection types: single, double signal reflections and a combination of both.For standard scenarios, optical data from satellites is easier to access, such as WorldView-2, compared to LiDAR data acquired from airborne sensors.Acquisition of optical data from satellites requires less planning, less man-power and is considerable cheaper than the acquisition of LiDAR data from airborne sensors.Therefore, extending the applicability of GeoRaySAR to use DSMs from high-resolution optical data is a crucial step toward realistic applications.
Studies on the generation of DSMs with the usage of optical stereo images acquired with space borne sensors are presented in e.g.(Zhang and Grün, 2006) and (Hoja et al., 2005).These studies indicate that DSMs generated from space borne sensor can have a height accuracy between 1 to 3 m, depending on the structure of the study area.Based on image matching with emphasis on tri-stereo data (Carl et al., 2013), DSMs have been generated from high resolution WorldView-2 stereo images for urban scenes, which resulted in few height outliers.(Carl et al., 2013) confirms that the geometric quality of DSMs generated from WorldView-2 data can be used for automated SAR image simulation and, based on that, the interpretation of urban scenes.This paper presents an extended approach for interpretation of urban areas with the usage of GeoRaySAR, using an automatic preprocessing chain.The main objective of the paper is to preprocess DSMs generated from WorldView-2, a high resolution optical satellite, for identification of buildings in SAR images (acquired with the satellite TerraSAR-X).These DSMs, in comparison to the previously used manually filtered DSMs, need to be preprocessed in terms of noise reduction and removal of vegetation.For evaluation of the preprocessing and determining the limitations of GeoRaySAR, study areas consisting of diverse types of buildings and densities are chosen.
The remainder of the paper is structured as follows, section 2 describes the method for processing the DSM to provide the input models for GeoRaySAR.The chosen study areas and used data are introduced in section 3, while results of the processing chain are shown and discussed in section 4, followed by the conclusion of the paper in section 5.

METHODOLOGY
GeoRaySAR requires as input one SAR image meta file, including the parameters related to the image acquisition, and one DSM.GeoRaySAR produces as output three layers which represent direct signal response, signal double reflection and the combination of both in one layer (high reflection levels are deactivated due to the limited level of detail of the DSM).The ray tracing procedure, which relies on triangulated surface models derived from the DSM (pixels), can be conducted for different input model derived from the DSM, which provides the opportunity to separate buildings and elevated vegetation.
The DSMs used in this study have been generated by utilizing the method described in (Krauß, 2014), a modified version of the semi global matching approach (SGM) (Hirschmüller, 2008).Since the main objective in this study is to predict building extents, elevated vegetation is excluded from the input to GeoRaySAR.This was done with an automatic chain for processing the DSMs generated from optical images by SGM.
The automatic preprocessing of the DSM consists of two steps, elevated vegetation filtering and noise reduction, and requires an orthorectified optical image and a DSM as input.The DSM and the optical image, which is preferably from same sensor, have to overlap each other, i.e. they have to cover the same scene, share the same spatial resolution and size in terms of number of pixels.A tiling process has been developed as an additional step for the processing of extended scenes.In sum, the steps highlighted in blue color in Figure 1 have been developed in the presented work.

Preprocessing
Elevated vegetation is separated from buildings to predict building extents in SAR images.The extent of elevated vegetation would be detected during the ray-tracing if left untouched in the DSM, since the GeoRaySAR simulates extents of all objects in the DSM.Hence, by filtering out the elevated vegetation, the DSM scene is cleared with only buildings and ground parts (nonvegetated, vegetated) remaining.
Identifying elevated vegetation is done by calculating the normalized difference vegetation index (NDVI) and utilizing a nDSM to detect vegetational growth taller than a height threshold (see chapters 2.1.2and 2.1.3).Grass is not separated from the DSM as it is considered as part of the ground surface.To reduce the noise in the binary mask, morphological operations are applied (see chapter 2.1.4).Elevated vegetation is separated from buildings in the DSM by using the binary mask and new height values are assigned to the affected pixels from the given DTM.
Detailed descriptions of the processing steps are presented in the following subsections (see Figure 2 for an overview, exemplifying the procedure for the Alte Pinakothek in Munich).The main objective is to connect geometric information (raw DSM) with the procedure of GeoRaySAR, while preparing the input model in an unsupervised manner.

Fuzzy Classification Using NDVI
The well-known NDVI is derived from combining the red and near infrared band, often used to detect healthy vegetation.The idea is adopted in this work by calculating the NDVI, by either using the combination of the red and the red-edge band or the red and the infra-red band (depending on the availability).The choice of the band combination can be changed depending on the chosen sensor.Equation 1 expresses the utilized band combination with RE being the red-edge band and R the red band.
Rule based fuzzy classification is utilized to classify the pixels into vegetation, using the same approach as in (Krauß et al., 2012); see Equation 2. Element x represents the NDVI value, c the lower threshold and d the upper threshold.The classification of vegetation results in values ranging from 0 to 1, representing the certainty that a pixel is vegetation or not.The thresholds used to classify vegetation are c = 0.2 and d = 0.4.These thresholds were chosen empirically after comparing classification results with the usage of different thresholds.A layer containing the probability of a pixel being vegetation or not is produced following this concept. Fuzzy

DTM and nDSM Generation
To keep non-elevated vegetation in the DSM, a nDSM had to be created.The reason to let grass remain in the DSM is to preserve ground information to enable the interpretation of ground parts.In this study grass was assumed to be vegetation with a maximum height of 20 cm.In contrast, elevated vegetation such as trees and bushes are excluded from the DSM and stored in an individual nDSM model.
The method of (Arefi et al., 2011) is used to generate the DTM.Generation of the DTM was done by utilizing a gray-scale reconstruction, an iterative morphological transformation that classifies the pixels into ground or non-ground.Padding is added to ease the identification of object boundaries close to the border of the image.The size of the padding is set to three pixels, assigned a value of the median DSM height in the full scene.To remove gaps in the resulting DTM, interpolation based on Delaunay triangulation is utilized, mentioned in (Arefi et al., 2011).After the removal of the padding pixels, the gaps in the DTM are filled with values derived from a multilevel B-Spline interpolation.By subtracting the interpolated DTM from the DSM, the nDSM is derived.Finally, non-zero elements of the nDSM (relative heights) are assigned with the original DSM heights (absolute heights).
2.1.3Separation of Elevated Vegetation By combining the information gained from the nDSM and the fuzzy classification, a binary mask is derived, assigning elevated vegetation with a pixel value of 0 and 1 otherwise.This is done by using the classification method described in (Krauß et al., 2012), but with a different minimum height.Pixels that have a certainty of 50% being vegetation and are taller than the minimum height are classified as elevated vegetation, as seen in Equation 3. Pixels that are considered being elevated vegetation (v in Equation 3) received a pixel value of 0.
2.1.4Morphological Operations, Gap Filling and Noise Reduction For reducing the noise in the binary mask, two morphological operations, closing and opening, were used.Closing consists of dilation, expanding the areas containing non-vegetation in this case, followed by erosion, contracting of the areas again.By performing closing, areas smaller than a fixed structure size are filled in with the pixel value 1. Opening consists of firstly executing erosion followed by dilation, which removes areas smaller than a fixed structure size.The noise in the binary mask is reduced by performing closing and opening.To reduce the noise for larger areas, the window for morphological operations is set to nine times nine pixels.
DSM pixels with discarded elevated vegetation, i.e. pixels with a value of 0 in the binary mask, receive new height values from the corresponding DTM pixel.A median filter is used for noise reduction with a window size of nine times nine pixels.The decision for median filtering was driven by the need to retain the building shapes.
2.1.5Building DSM Based on the pre-filtered DSM and DTM, a building DSM is derived.The corresponding DSM pixels are identified by analyzing the difference between the preprocessed DSM and a DTM (value > 0).The noise in the resulting binary mask is reduced by morphological operations (opening with 9x9 window, dilation with 3x3 window).The final mask is used to extract the building-related pixels from the pre-filtered DSM.In the context of ray tracing, only the remaining DSM pixels are used for triangulation to describe the scene geometry.Accordingly, SAR signal responses are only derived for building bodies.

TILING
The local incidence angle is assumed to be constant during the ray-tracing, which does not correspond to reality.Moreover, while dealing with larger scenes, the image simulation requires a considerable amount of computer memory.To overcome these aspects, a tiling procedure is implemented, which introduces spatial sampling of the local signal incidence angle.The splitting of the DSM is based on the distance along the longest axis, which reduces the deviation from the true signal incidence angle.The threshold for splitting the DSM is set to 1200 m for the case studies presented below.Each tile is processed by the simulation chain, followed by the merging of the tiles to derive the full scene.The resulting tiles may vary in size due to a variation of maximum heights in the tiles.To cope with this, the maximum and minimum coordinates along the two image axes among the tiles are identified, followed by the calculation of the size for the merged image.Then, an empty image layer is created and the pixel values are retrieved from the simulated tiles.The maximum intensity value is chosen in overlapping areas to keep the emphasis on the building appearance in the image.
The spatial distance is used for splitting, inherently sampling the signal incidence angle.If the tiling was based on the change of the incidence angle, one study area would be split into a different number of tiles depending on the sensor perspective.At steeper angles, the angle difference increases faster, i.e.SAR images taken at steeper angles would be split into more tiles.The local scenes would differ unnecessarily in that case, which would hamper the comparison of simulated scenes with remarkable incidence angle differences.

STUDY AREAS AND DATA
The study areas consist of selected locations in the cities of Munich, London and Istanbul.The three cities contain a variation of built up areas, such as densely packed quarters, narrow streets surrounded by tall buildings and buildings of numerous sizes and types.This is of interest, since such a variation of study scenes has not been used as input in GeoRaySAR earlier.
Data from WorldView-2 is used for the separation of elevated vegetation and for the generation of the DSMs.The SAR images are acquired with TerraSAR-X, where the image meta files are used for extracting the sensor and image parameters.A DSM generated from LiDAR data for the scene of Munich is utilized for comparison with the preprocessed DSM based on WorldView-2, as seen in (Tao et al., 2014).The LiDAR data was acquired April 2003, with a vertical resolution of 0.1 m and a horizontal resolution of 1 m.
The images acquired from WorldView-2 were delivered with a resolution of 0.5 m for the panchromatic images and 2 m for the multi-spectral images.The DSMs had a resolution of 0.5 m and are generated prior to this study by using the modified version of SGM (Krauß, 2014), matching several panchromatic images captured at different viewing angles from the same orbit pass.Four images are used for DSM generation for Munich, five for London and three for Istanbul.The images were acquired 12th of July 2010 for Munich, 22nd of October 2011 for London and 15th of July 2015 for Istanbul.The optical images used for the fuzzy classification were orthorectified.
The TerraSAR-X images were captured in high resolution spotlight mode with a spatial resolution of 0.6 m in range and 1.1 m in azimuth (pixel spacing along both axes: 0.5 m).Table 1 provides an overview of information on the TerraSAR-X images.

Signal reflections
Using the preprocessed DSMs and SAR image meta data as input, SAR image layers were generated.Figure 3 shows the resulting images for the urban scene in Munich, simulated with the DSM generated from LiDAR data and the DSM generated with optical data.The area covers the area of the Viktualienmarkt, located in central Munich, which contains smaller buildings, such as food stalls and shops, and bigger buildings in the surrounding area.
As seen in Figure 3a, the extent of smaller stalls and shops can be seen in the center of the image, which is more difficult to detect in Subfigure 3b.This is due to the quality of the DSM, since it is being generated from optical data and processed.However, the extent of the bigger buildings can be clearly seen in both simulation results.
Figure 4 shows the result for the site in the center of London, located closely to the subway station Southwark.In comparison to the scene in Munich, the area contains many residential buildings with rectangular shapes.
The simulated SAR images can be seen in Figures 4a and 4b, corresponding to the two TerraSAR-X acquisitions.The image pixels appear brighter in 4b compared to 4a.This is caused by the smaller signal incidence angle which leads to stronger diffuse signal responses from ground parts in comparison to the response of building walls, which is mapped to larger layover areas (intensity is scaled to 8-bit gray values).The steeper signal incidence angle also leads to overlay effects of nearby buildings, as changes in height are mapped to bigger range intervals.
Figure 5 shows the simulation results and related satellite images for the selected site in Istanbul, which contains the north western parts of the district Fatih.In contrast to the sites in London and Munich, this site is densely packed with small buildings on a hill.Tiling was utilized for the scene of Istanbul, due to its size.This can be seen in the northeastern parts of Figure 5a, caused by a scaling difference in the local scenes.
As seen in Figures 5a and 5b, the building extents reveal very heterogeneous appearances, partly regular in the northern part and variable for the building circle surrounding the square in the center.The shape of the hill in the scene center is visible in the DSM shown in Figure 5c with height increasing with brightness.Again, the building extents are easier to interpret for the bigger signal incidence angle (compare Figures 5a and 5b) where the layover area is more compressed.Therefore, the geometrical representation of the scene appears to be more valuable in Figure 5a, e.g., in the context of object-related applications.

Identification of Building Pixels
The site of Frauenkirche in Munich was chosen to display the building extent difference between the preprocessed DSM and the building DSM, seen in Figure 6.
Smaller red dots can be seen spread around the scene, which is most likely elevated vegetation that was not correctly separated during the preprocessing.A few pixels in the building DSM has not been successfully detected as objects above ground, visualized as bright grey located which breaks through the building DSM in red.This is an indication that objects were not completely removed in the DTM, since the height difference between the DSM and the DTM was close to zero.However, most building extents have been traced completely.

Impact of DSM Quality
A test site in the south of Barbican in London was chosen for the evaluation of the quality of the DSMs, which is based on three DSMs generated with a different number of WorldView-2 images.One DSM was generated with two images, one with three and one with five. Figure 7 presents the SAR simulation results and the corresponding input DSMs.
The case study confirms that the simulation benefits from the increase of more images used for the generation of the DSM.The resulting SAR image is less noisy and reveals less gaps and errors for facade parts.More interestingly, however, the results indicate that DSMs based on two images lead to acceptable results, i.e., building extents are already described for most buildings.This is promising in the context of realistic scenarios where the availability of satellite data sets is limited.

CONCLUSION AND OUTLOOK
It has been shown in this paper that DSMs derived from WorldView-2 data are of sufficient geometric quality for automated SAR simulation in urban areas.For preparing the necessary input models, a dedicated preprocessing chain has been developed which contains filtering, decomposition, and tiling steps.The presented case study results for Munich, London, and Istanbul indicate the capabilities and limits of the simulation method, the latter being primarily related to the input models and the availability of multispectral information for filtering.
Differences in the appearance of buildings have been compared for DSMs generated from LiDAR data (airborne sensor) and DSMs The impact of a signal incidence angle differences was exemplified for urban scenes, showing better separability of building layover areas for steeper incidence angles.Hence, SAR data with bigger signal incidence angles are expected to be more suitable for dense urban scenes.Finally, DSMs generated from 2 or 3 WorldView-2 images are applicable for SAR simulation, even if the difference to results using DSMs from 5 WorldView-2 images is obvious (geometric completeness, noise level).
With the implementation of an automatic preprocessing chain for DSMs generated from optical images, the usage of GeoRaySAR has broadened.Hence, it is not limited to data that only contain buildings, such as 3D GIS models and LiDAR DSMs delivered without elevated vegetation.Data acquired from high-resolution multi-spectral satellite based sensors can be integrated into the developed automatic chain, but would require small adjustments (changes to e.g. the band combination for the NDVI, the meta data import and naming convention).Hence, future studies with GeoRaySAR may extend simulations to further sensor data and expand the study areas to other complex areas sites.

Figure 1 .
Figure 1.Work flow of the simulation chain, with the preprocessing and the tiling marked in blue.

Figure 2 .
Figure 2. Preprocessing steps, exemplified for Alte Pinakothek in Munich, Germany.A DSM generated from WorldView-2 data (a) and an optical image from WorldView-2 (b) are used as input to the preprocessing chain.The NDVI (c) is calculated to do the fuzzy classification (d).A DTM (e) and a nDSM (f) are generated.A binary mask (g) is determined to separate elevated vegetation.Noise in the mask is reduced by the morphological operators closing (h) and opening (i).Image (j) displays the replacement of the height values and (k) the final DSM after noise reduction.

Figure 7 .
Figure 7. Impact of DSM quality on simulated SAR images; scene: Barbican, London.The DSMs have been generated from different sets of optical images; (a, c, e) simulated SAR images; (b, d, f) DSMs derived from 2, 3, and 5 WorldView-2 images; acquisition date of TerraSAR-X image: November 1st, 2015 generated from WorldView-2 data.The simulated images reveal a better description of small buildings for the LiDAR-based DSM whereas the appearance of buildings of larger scale is comparable.Hence, object-based SAR applications related to buildings or building-blocks are realistic, e.g., in the context of city monitoring or change detection.

Table 1 .
Details about used TerraSAR-X data