Data fusion of high-resolution satellite imagery and GIS data for automatic building extraction

Automatic building extraction in urban areas has become an intensive research as it contributes to many applications. High-resolution satellite (HRS) imagery is an important data source. However, it is a challenge task to extract buildings with only HRS imagery. Additional information and prior knowledge should be incorporated. A new approach building extraction is proposed in this study. Data sources are QuickBird imagery and GIS data. The GIS data can provide prior knowledge including position and shape information, and the HRS image has rich spectral, texture features. To fuse these two kinds of features, the HRS image is first segmented into image objects. A graph is built according to the connectivity between the adjacent image objects. Second, the position information of GIS data is used to choose a seed region in the image for each GIS building object. Third, the seed region is grown by adding its neighbor regions constrained by the shape of GIS building. The performance is evaluated according to the manually delineated buildings. The results show performance of 0.142 in miss factor and detection percentage of 89.43% (correctness) and the overall quality of 79.35%.


Introduction
Automatic extraction of man-made objects in urban areas such as roads, buildings, bridges and other impervious surface has been intensive research for many years.As one of the most prominent features in urban environment, obtaining accurate building objects are essential to many applications, such as urban planning, landscape analysis, cartographic mapping, and map updating, etc.
In earlier days, the works on building extraction focused on aerial imagery due to its high spatial resolution.With high-resolution satellites successfully launching, HRS imagery has been gradually used for building extraction.The HRS imagery provides fine detail of urban areas.With the high geometric accuracy and high spatial resolution, it is possible to identify the individual buildings.Wei et al. (2004) used the clustering and edge detection algorithm to detect the edges of candidate buildings in QuickBird panchromatic imagery and then extracted buildings from the detected edges.Jin and Davis (2005) demonstrated a strategy combining structural, contextual, and spectral information to extract building * Corresponding author in the city of Columbia with IKONOS satellite imagery.
Although HRS imagery provides a more detailed description of the observed scene, the complexity and diversity make it more difficult to interpret (Duan et al., 2004).The task of automatic building extraction on HRS imagery is difficult mainly due to three reasons: scene complexity, incomplete cue extraction and sensor dependency (Sohn and Dowman, 2007).
It is a hard to extract buildings automatically with high accuracy solely based on HRS imagery.
Additional data and prior knowledge should be incorporated.There have been a few recent studies extracting buildings with a combination of HRS imagery and other data sources.Sohn and Dowman (2007) combined IKONOS imagery and LiDAR data to extract buildings in Greenwich area and got fine extraction results.Duan et al. (2004) proposed a approach based on fuzzy segmentation.The input data were QuickBird imagery and digital GIS map.Vosselman (2002) combined Laser scanning data, maps and aerial photographs for building reconstruction, in which purposes of using maps were: 1) to locate buildings in both laser scanning data and aerial International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W1, 3rd ISPRS IWIDF 2013, 20 -22 August 2013, Antu, Jilin Province, PR China photographs, 2) to reveal the information about the structures of buildings.
The study aims at developing an efficient and accurate approach to extract buildings from HRS imagery and GIS data.The GIS data is acquired by mobile measuring vehicles.In order to take advantage of the HRS imagery and GIS data, the approach first segments the HRS imagery into objects.A graph is built to manipulate image objects more efficiently according to the connectivity between the adjacent image objects.Second, GIS data is used to provide reliable position information of buildings.For each building object in GIS data, an image object is selected as a seed respectively.Third, the seed polygon is grown by adding its neighbor image objects.The overlapping degree between image objects and GIS data, and the shape information of GIS data are used to determine whether the neighbor objects belong to the same buildings.Finally, the performance of the automated approach is evaluated using the manually delineated buildings.In the same study area, a GIS vector map of buildings is also obtained.The map is produced by mobile measuring vehicles (Fig. 1 (b)).The outlines of buildings in GIS map and QuickBird imagery cannot completely coincide with each other due to the following reasons: First, the representations of the same buildings in GIS data and HRS imagery are different.Several adjacent buildings which are functionally consistent may be represented as one, while in image, they are still recorded individually.

Test dataset
Second, the projections in two data are different.The GIS map is produced by using the upright projection, while imagery by oblique photography.Therefore, GIS data represents the bases of buildings, but imagery encodes the side wall of buildings.Third, GIS map and imagery have different accuracies, deformations and registration errors.The uncertainties are usually not more than 0.5 meter.Thus allowing a simple building location (Vosselman, 2002).Studies (Myint et al., 2011;Hay and Gastilla, 2008) International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W1, 3rd ISPRS IWIDF 2013, 20 -22 August 2013, Antu, Jilin Province, PR China have showed that OBIA methods are more efficiently than per-pixel approaches in identifying urban land cover classes on HRS imagery.OBIA methods use image objects as the analysis units instead of single pixels.It can better utilize image features such as shape, size, texture and context.
The image objects is generated by using the multiresolution segmentation in eCognition Developer 8.7.
The segmentation starts from one pixel, and merges small segments into bigger ones until a heterogeneity threshold is reached (Benz et al., 2004).The heterogeneity thresholds are defined by users.The quality of segmentation is highly related to the input thresholds.In order to find an optimal scale, we carried out 10 segmentations by using different parameters ranging from 30 to 150.Earlier experiments showed that scale parameter smaller than 30 produces oversegmented objects, while scale parameter larger than 150 produces more under-segmented objects.The best segmentation is selected by visual interpretation.
Color/shape weights are 0.7/0.3instead of the default value 0.9/0.1, for considering more shape information.Smoothness/compactness are set to 0.5/0.5 so as to not favor either compact or non-compact segments.We carry out the image segmentation procedure using eCognition Developer 8.7, and export the image objects in the SHP file.

Image object graph
In order to manipulate the image objects efficiently, a graph G(V, E) is created, in which each image object   is represented as a node (Fig. 3).The edge connecting two nodes represents that two image objects are neighbors and share boundary.The building region growth process starts from one node and proceeds on its neighboring nodes, which be discussed in detail in the Section 4.3.The GIS data can be used to identify seed polygons.For each building in the GIS map, we select an image object whose centroid has the nearest distance with the centroid of GIS building.The centroid of a polygon   refers to the centroid of its minimum bounding rectangle (MBR), denoted by (   ) .
Therefore, the distance between   and   is equal to the distance between (   ) and (   ) , denoted by D(   ,   ).
For each   ,   is selected from {   } =1  as the seed polygon using Eq.(1).

Seed region growth
A building in the image is represented by its seed building polygon and adjacent image objects in the segmentation image.The building region growth procedure starts from the seed polygon, and merges its adjacent objects into bigger ones until the merged objects fit best with the building objects in the GIS building map.The procedure of region growth is described as follows: i) For each building in the GIS map, two sets are created.The first is set T, it stores the image objects that have past the testing procedure and suitable for building region growth.The second is set S, which stores image objects that are candidates of building objects and need to be tested later.

Overlapping degree
Overlapping degree is used to measure the similarity of area between image objects and vector polygons in GIS map.The degree of overlap (DOV) between an image object   and a polygon   is defined as follows: Where (   ∩   ) denotes the overlapping areas between   and   .The larger (   ,   ) is, the larger the similarity is.The value of (   ,   ) ranges from 0 to 1.

Boundary distance
Boundary distance is to measure the boundary similarity between image objects and polygons in GIS

Region growth criterion
The two indicators above are used to reveal the similarity between the image objects and the GIS objects.

Experiment results and discussions
The aim of this section is to assess the accuracy of building extraction results.Firstly, the building extraction results are presented.Secondly, the performance of the proposed building extraction approach is evaluated with reference to the manually delineated buildings.Finally, possible errors and inadequacies are analyzed.

Building extraction result
The developed approach is applied to the Beijing dataset and 515 buildings are automatically extracted.
Fig. 5 shows extracted building results.As shown in Fig. 5, most of the buildings in the image are successfully extracted.The outlines of extracted buildings are accurate (Fig. 6).
Fig. 5 The extracted buildings extraction results.

Quality assessment
Manually delineated buildings are considered as reference building polygons for accuracy assessment.
The automatically extracted and the reference buildings are compared pixel-by-pixel.Each pixel in the image fall into the four categories (Lee et al., 2003): True Positive (TP): The pixel is labeled as building in both the automatically extracted and delineated maps.

True Negative (TN):
The pixel is labeled as nonbuilding in both the automatically extracted and delineated maps.

False Positive (FP):
The pixel is labeled as building in the automatically extracted map, and as non-building in the delineated map.

False Negative (FN):
The pixel is labeled as nonbuilding in the automatically extracted map, and as building in the delineated map.
Based on the four categories above, the following statistical measures are employed to assess quantitatively the accuracy of the extracted buildings: Branching Factor:  (Sohn and Dowman, 2007).The completeness is the ratio of the correctly labeled building pixels (TP) with respect to the total building pixels by the manually delineated building map ( +  ).The correctness is the ratio of the correctly labeled building pixels (TP) with respect to the total building pixels by the automatically extracted building map ( + ).These two factors indicate a measure of building detection performance (Sohn and Dowman, 2007).The quality percentage combines aspects of both boundary delineation accuracy and building detection rate.It describes how likely a building pixel produced by the automatic approach is true.
Table 1 shows the pixel classification results by comparing the automatically extracted building map with the delineated building map.Based on the results, the statistical measures are calculated by Eq. ( 4).The proposed approach has accuracies of 0.118 in branching factor, and 0.142 in missing factor.It means that the number of incorrect pixels is less than the one of the missed building pixels.The automatic process detects building at the rate of 87.56% in completeness, while 89.43% in correctness.Finally, the overall accuracy of the proposed method is evaluated as 79.35%.

Error analysis
The quality assessment shows a satisfactory result.There may have some errors existing in the GIS building map.The final building extraction results are highly relevant to the GIS building map accuracy.The higher accuracy the GIS building map is, the better extraction results we get.

Conclusions
This paper presented a new building extraction approach which combines HRS imagery and GIS data.
First, the high-resolution image was segmented into image objects by using the multi-resolution algorithm.
A graph was built according to the connectivity between the adjacent image objects.Second, the position information of GIS data was used to select a seed polygon for each GIS building object.Third, the seed polygon was extended by adding its neighbor objects.Two indicators, i.e. overlapping degree and boundary distance, were defined as criteria to grow seed regions.Finally, the accuracy of the building extraction result was evaluated with reference to manually delineated buildings.This evaluation showed that the branching factor was 0.118, the miss factor 0.142, the completeness 87.56%, the correctness to 89.43% and the quality percentage 79.35%.The overall quality of the result were satisfactory.According to the performance, it can be concluded that combining the HRS imagery and GIS data is an effective method to extract buildings.However, the error analysis shows that some errors can be reduced further.The image segmentation algorithm which combines highresolution image and GIS data can be studied in the further research.Furthermore, the error analysis demonstrates that this technique can automatically detect the difference between the GIS data and the high-resolution image, which can be used in change detection and map updating.

For
the current research, a pan-sharpened multispectral (PSM) QuickBird image is used.It covers a portion of Beijing City with the size of 1.7km×0.9km.The PSM image is produced by combining the multispectral data with the corresponding panchromatic data, and resampled with 0.61-metre ground pixel.Fig. 1 (a) is the natural color composite of the selected QuickBird image.As can be seen in Fig. 1(a), the QuickBird image shows a typical urban environment with schools, residential areas, commercial district and industrial park.The buildings are distributed densely and vary greatly in their sizes and structures.

Fig. 3
Fig. 3 Image objects and corresponding graph3.3Seed polygons selectionA seed polygon is an image object which belongs ii) Put the seed image object   into the set T, and the neighbors of   into the set S.iii) If set S is not empty, go to the iv); else end the procedure.iv) Fetch an objet   from set S. If   meets the condition for region growth,   is put into the set T, and the neighbor objects of   into the set S; otherwise, go to the iii).International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W1, 3rdISPRS IWIDF 2013, 20 -22 August 2013, Antu, Jilin Province, PR China   Criterions are helpful to determine whether to grow an image object.Here two indicators are used.
building map.Let   = {  } =1   be the point set on the boundary of object   , and   = {  } =1   be the point set on the boundary of polygon   .The boundary distance (BD) between   and   is calculated as follow:  (, ) is the minimum distance value between the point s and   .  is the number of points in the boundary of   .In Fig.4, s is a point in   ; t is a point in   , and has the nearest distance with point s.The distance from point s to the boundary of   equals one from point s to point t.The smaller (   ,   ) is, the larger the similarity is.

Fig. 4
Fig. 4 Boundary distance between   and indicates the rate of incorrectly labeled building pixels (FP), while the missing factor shows the rate of missed building pixels (FN).These two factors are closely related to the boundary delineation performance of the automatic building extraction algorithm degree between   and its corresponding GIS object   .If (   ,   ) is too small (e.g.0.2),   is abandoned.If (   ,   ) is large enough (e.g.0.8), we grow   , and put it into the identified set T. If (   ,   ) is neither too large (   ,   ) ) is calculated.Then,   and   is merged into a larger object, called  + .The boundary distance between  + and   (i.e.(  + ,   )) is calculated as well.The value of (   ,   ) and (  + ,   ) is compared.If (   ,   ) is smaller than (  + ,   ) , it means that the boundary similarity between   and   is larger, and To determine whether an image object   can meet the condition for region growth, we first calculate the overlapping is abandoned.Otherwise, the boundary similarity between  + and   is larger, and   is added into set T.
(2) GIS building map error.The proposed building extraction method uses high-resolution imagery and GIS data as data sources.The GIS data cannot