GROUND OBJECT RECOGNITION USING COMBINED HIGH RESOLUTION AIRBORNE IMAGES AND DSM

The research is carried on dataset Vaihingen acquired from ISPRS Test Project on Urban Classification and 3D Building Reconstruction. Four different types of ground objects are extracted: buildings, trees, vegetation (grass and low bushes) and road. Spectral information is used to classify the images and then a refinement process is carried out using DSM. A novel method called Sparse Representation is introduced to extract ground objects from airborne images. For each pixel we extract its spectral vector and solve Basis Pursuit problem using l1 minimization. The classification of the pixel is same as the column vector of observation matrix corresponding to the largest positive component of the solution vector. A refinement procedure based on elevation histogram is carried on to improve the coarse classification due to misclassification of trees/vegetation and buildings/road. * Qingming Zhan, Professor and Associate Dean of School of Urban Design, Deputy Director of Research Centre for Digital City, Wuhan University. 1. INTRODUTION In recent years LiDAR (Light Detection And Ranging) has emerged as a new technology which provides valuable data in various forms and scales for mapping and monitoring land cover features. Its use has increased dramatically due to availability of high-density LiDAR data as well as high spatial/spectral resolution airborne imageries. However the data from these different sensors have their own characteristics. Spatial information which can be used to derive highly accurate DSM of scanned objects can be directly obtained from LiDAR data. On the other hand, high resolution airborne imageries offer very detailed spectral/textural information of ground objects. Although aerial photography has been used as a mapping tool for a century, the fusion of aerial photography and LiDAR data has only been possible in the past few years due to advances in sensor design and data acquisition/processing techniques (Baltsavias, 1999). So combining these two kinds of complementary datasets is quite promising for improving land cover mapping (Tao and Yasuoka, 2002). There have been some attempts to fuse LiDAR and highresolution imagery for land cover mapping and very promising results are shown in recent years. Haala and Brenner (1999) combined a LiDAR derived DSM with three-color-band aerial images to apply unsupervised classification based on the ISODATA (Iterative Self-Organizing Data Analysis Technique) algorithm to normalized Digital Surface Model (nDSM) and CIR image. In their experiment, nDSM was used to classify objects which had different distribution patterns in elevation direction. The low-resolution LiDAR data was greatly facilitated to separate trees from buildings by the near-infrared band from the aerial imagery. Schenk and Csatho (2002) exploited the complementary properties of LiDAR and aerial images to extract semantically meaningful information. Rottensteiner et al. (2005) used a LiDAR derived DTM and the Normalised Difference Vegetation Index (NDVI) from multispectal images to detect buildings in densely built-up urban areas. The rule-based classification scheme applied Dempster-Shafer theory to delineate building regions, combining NDVI and the average relative heights to separate buildings from other objects. Ali et al. (2005) applied an automated object-level technique based on hierarchical decision tree to fuse high-resolution imagery and LiDAR data. Sohn and Dowman (2007) presented an approach for automatic extraction of building footprints in a combination of multispectral imagery and airborne laser scanning data. The presented method utilized a divide-merge scheme to obtain the recognized building outline. A comparison of pixeland object-level data fusion and subsequent classification of LiDAR and high-resolution imagery was carried out by Ali et al. (2009). The results showed that fusion of the color imagery and the DSM generally exhibited better results than sole classification of color imagery. The underlying assumption of fusion of multisource data is that classification accuracy should be improved due to more incorporated features (Tso and Mather, 2001). Image fusion can be performed at pixel-, object or featureand decision-levels (Pohl and van-Genderen, 1998; Schistad-Solberg et al., 1994). Pixel level fusion focused on the merging of physical parameters derived from multisource data. It is very sensitive to geo-referencing and pixel spacing and topological information is often not used in the fusion and subsequent procedures. Object-level image fusion methods usually segment multisource data into meaningful objects which consists of many data units. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 XXII ISPRS Congress, 25 August – 01 September 2012, Melbourne, Australia


INTRODUTION
In recent years LiDAR (Light Detection And Ranging) has emerged as a new technology which provides valuable data in various forms and scales for mapping and monitoring land cover features.Its use has increased dramatically due to availability of high-density LiDAR data as well as high spatial/spectral resolution airborne imageries.However the data from these different sensors have their own characteristics.Spatial information which can be used to derive highly accurate DSM of scanned objects can be directly obtained from LiDAR data.On the other hand, high resolution airborne imageries offer very detailed spectral/textural information of ground objects.Although aerial photography has been used as a mapping tool for a century, the fusion of aerial photography and LiDAR data has only been possible in the past few years due to advances in sensor design and data acquisition/processing techniques (Baltsavias, 1999).So combining these two kinds of complementary datasets is quite promising for improving land cover mapping (Tao and Yasuoka, 2002).
There have been some attempts to fuse LiDAR and highresolution imagery for land cover mapping and very promising results are shown in recent years.Haala and Brenner (1999) combined a LiDAR derived DSM with three-color-band aerial images to apply unsupervised classification based on the ISODATA (Iterative Self-Organizing Data Analysis Technique) algorithm to normalized Digital Surface Model (nDSM) and CIR image.In their experiment, nDSM was used to classify objects which had different distribution patterns in elevation direction.The low-resolution LiDAR data was greatly facilitated to separate trees from buildings by the near-infrared band from the aerial imagery.Schenk and Csatho (2002) exploited the complementary properties of LiDAR and aerial images to extract semantically meaningful information.Rottensteiner et al. (2005) used a LiDAR derived DTM and the Normalised Difference Vegetation Index (NDVI) from multispectal images to detect buildings in densely built-up urban areas.The rule-based classification scheme applied Dempster-Shafer theory to delineate building regions, combining NDVI and the average relative heights to separate buildings from other objects.Ali et al. (2005) applied an automated object-level technique based on hierarchical decision tree to fuse high-resolution imagery and LiDAR data.Sohn and Dowman (2007) presented an approach for automatic extraction of building footprints in a combination of multispectral imagery and airborne laser scanning data.The presented method utilized a divide-merge scheme to obtain the recognized building outline.A comparison of pixel-and object-level data fusion and subsequent classification of LiDAR and high-resolution imagery was carried out by Ali et al. (2009).The results showed that fusion of the color imagery and the DSM generally exhibited better results than sole classification of color imagery.
The underlying assumption of fusion of multisource data is that classification accuracy should be improved due to more incorporated features (Tso and Mather, 2001).Image fusion can be performed at pixel-, object or feature-and decision-levels (Pohl and van-Genderen, 1998;Schistad-Solberg et al., 1994).Pixel level fusion focused on the merging of physical parameters derived from multisource data.It is very sensitive to geo-referencing and pixel spacing and topological information is often not used in the fusion and subsequent procedures.Object-level image fusion methods usually segment multisource data into meaningful objects which consists of many data units.This kind of fusion techniques are often based on the spectra and spatial characteristics derived from datasets and the segmented objects are combined for further object recognition using fuzzy clustering, hierarchical decision tree and other pattern recognition algorithms (Geneletti and Gorte, 2003).
Nowadays LiDAR data are often derived from one or multi returns of laser pulses and the digital imageries usually contain multispectral bands.With the availability of full-waveform LiDAR data and hyperspectral imageries, the problems of data fusion and pattern classification become more complicated.Opportunities are that high classification accuracy should be achieved due to more spectral and spatial features.But there are still challenges in data processing, waveform modeling and measurements interpretation of full-waveform LiDAR (Wagner et al., 2004).

METHODOLOGY
The workflow and software we use are illustrated in Figure 1.The main tasks are described in the following subsections with emphasis on method for ground object extraction.

Data Preparation
Orientation and registration procedures should be carried out first to guarantee that multisource data are operated under the same spatial framework (Habib, et al., 2006).The provied DSM file with resolution of 25cm is used as reference to orthrectify and mosaic image using given orientation parameters and the task is completed using Leica Photogrammetry Suite.
We combine mosaic image with DSM data and extract Area of interest (AOI) using ERDAS IMAGINE.Area1, Area2 and Area3 are extracted as required and image of each test area has four 'bands' (namely IR-R-G-H).All the airborne images are contrast-enhanced before classification.

Ground Object Extraction
Buildings, trees and vegetation (natural ground covered by vegetation) are extracted in Area1, Area2 and Area3.Before the extraction, we enhance the contrast of image to improve distinctiveness of different ground objects (Figure 2a).The ground object extraction procedure consists of two steps: coarse classification and refinement.Firstly we use spectral information to coarsely classify the images.Then a refinement process is carried out using elevation information.The method we use to extract ground objects is Sparse Representation The key idea is to represent the spectral vector (vector of IR-R-G value) of a pixel using spectral vectors of pixels of typical ground objects.The problem of classification is formulated as a Basis Pursuit problem and then solved using convex programming (Equation 1) methods in MATLAB. 1 min , . .x s t y Ax = (1) where y is the spectral vector of a pixel and column vectors of observation matrix A are spectral vectors of pixels of typical ground objects.These pixels are interactively selected on the images of test areas.In our implementation, we select five pixels for each typical ground objects (that is trees/vegetation, buildings and road).Then a test procedure is carried out to examine the distinctiveness of spectral vectors we select as observations and vectors which lead to misclassification are updated.Lastly, each pixel of images from test areas is classified using given observation matrix A. The procedure works as follows: for each pixel we extract its spectral vector as y in Equation 1; then we solve Equation 1 using l1 minimization solver; the classification of the pixel is same as the column vector of A corresponding to the largest positive component of the solution vector x (Figure 2b,4b,6b).
Therefore the methodology we use is under framework of Supervised Classification.And it is in essence a pixel-oriented classification method.
Often we have to refine the coarse classification due to misclassification of trees/vegetation and buildings/road.Refinement is mainly based on elevation histogram.We select values that separate trees/vegetation and buildings/road as thresholds to refine coarse classification results.
The outputs of "Ground Object Extraction" have to be georeferenced due to loss of geoinformation when processing in MATLAB.The classified objects are separately output to files and georeference information is added using ERDAS IMAGINE.
Figure 1: Workflow and the software tools

RESULTS
The whole research area is illustrated in Figure 2. Three test areas for buildings/trees/vegetation extraction are outlined in yellow.Test Area 1 consists of house with complex roof structures.The ground objects in Area 2 are mainly trees and buildings.Area 3 The classification results are separately shown in Figure 3a -3c, 4a-4c, 5a-5c).The extracted objects are colorcoded as: vegetation (green), trees (yellow), road (blue) and buildings (red).

CONCLUSIONS AND FUTURE WORK
In this paper, we introduce Sparse Representation framework to classify the high resolution airborne images and use DSM derived from LiDAR data to improve the classification.It's a pixel-oriented classification method and under the framework of Supervised Classification.The problem of ground object recognition is formulated as a Basis Pursuit problem and solved using convex programming methods in MATLAB.The key idea of this method is to represent the spectral vector (vector of IR-R-G value) of a pixel using observation matrix.The observation matrix consists of spectral vectors of pixels of typical ground objects (that is buildings, trees, vegetation and road) which are interactively selected on the images of test areas.A test procedure is carried out to examine the distinctiveness of the selected pixels and pixels which lead to misclassifications are replaced.In the recognition process, each pixel of images from test areas is classified using given observation matrix.Misclassifications often result from both steps of the classification procedure.Misclassifications of buildings and road in coarse classification procedure are mostly due to shadows and spectral similarity.Some low bushes in the shadow at left-bottom of Figure 2b are classified as buildings.All the buildings with similar color like road are misclassified as road (Figure 4b).In the procedure of refinement, misclassifications result from enforcement of only single elevation threshold on classification of ground objects.Adaptive methods that take account of local properties of ground objects and methods based on semantic knowledge may help improve the result.In our future work generality and effectiveness of our method will be further investigated and adaptive methods will be examined.