Unsupervised Automatic Building Extraction Using Active Contour Model on Unregistered Optical Imagery and Airborne LiDAR Data

Automatic extraction of buildings in urban scenes has become a subject of growing interest in the domain of photogrammetry and remote sensing, particularly with the emergence of LiDAR systems since mid-1990s. However, in reality, this task is still very challenging due to the complexity of building size and shapes, as well as its surrounding environment. Active contour model, colloquially called snake model, which has been extensively used in many applications in computer vision and image processing, is also applied to extract buildings from aerial/satellite imagery. Motivated by the limitations of existing snake models addressing to the building extraction, this paper presents an unsupervised and fully automatic snake model to extract buildings using optical imagery and an unregistered airborne LiDAR dataset, without manual initial points or training data. The proposed method is shown to be capable of extracting buildings with varying color from complex environments, and yielding high overall accuracy.


CONTEXT
Automatic extraction of buildings from aerial/satellite imagery in urban and residential scenes has become a subject of growing interest in the domain of photogrammetry and remote sensing. Indeed, a large number of building detection and extraction techniques have been reported over the last few decades, particularly with the emergence of LiDAR (Light Detection and Ranging) systems. For instance, (Rottensteiner, Briese, 2003) proposed a building detection method exploiting primarily LiDAR data while removing vegetation using imagery data. (Sohn, Dowman, 2007) focused on exploiting the synergy of IKONOS multispectral imagery combined with a hierarchical segmentation of a LiDAR digital elevation model (DEM) to extract buildings. Another method of building detection was proposed by (Awrangjeb et al., 2010) based on building masks obtained from LiDAR and multispectral imagery. These methods using both complementary sources, namely LiDAR data and optical imagery, achieve better building extraction results. Other building detection and extraction methods utilize only LiDAR data such as (Khoshelham et al., 2013, Zhang et al., 2013. They involve segmentation algorithms and classification using attributes such as building size, shape, height and PCA (Principal Component Analysis) features. However, these approaches usually face problem of misclassification of vegetation as buildings (Zhang, Lin, 2017).
Originally introduced by (Kass et al., 1988), snake model has been studied for building extraction from urban area using aerial and satellite imagery. While (Guo, Yasuoka, 2002) used snake model with balloon force to extract buildings, (Peng et al., 2005) focused in increasing stability of snake convergence. (Kabolizade et al., 2010) proposed to improve the model with imagery data coupled with a Digital Surface Model (DSM) generated from LiDAR data. This improvement involves using * Corresponding author two energy terms based on the variances of height and graylevel between snake points. Hence, the height variance energy term requires height information for every pixel of the image, in other words, the DSM must be of same size and resolution as the imagery data. This can be problematic since LiDAR dataset does not often have a spatial resolution as high as aerial imagery, and height data interpolation may be unreliable. (Fazan, Dal Poz, 2013) proposed another approach based on exhaustive searches of rectilinear building corners from the image, optimized by dynamic programing. However, this method depends heavily on initial points to have decent results.  proposed an area-based geometrical snake model to detect building boundaries without height information or initial points. Nevertheless, this method requires a priori sampled gray-levels of buildings and grounds (i.e. training data) to attract the snakes toward desired buildings. It also may not work well with building roofs with varying gray levels.
To summarize, the main common issues of snake models in building boundary extraction are: • Sensitivity to noise and image details; • Dependence on initial points or training data; • Weak convergence to building corners; • Snake's convergence sensitivity to its number of points and weighting parameters.
Motivated by the limitations of existing snake models listed above, this paper presents an unsupervised snake model to extract buildings using optical imagery and an unregistered airborne LiDAR dataset, without manual initial points or training data. Instead, we propose to initialize our proposed snake model with projected LiDAR building boundary points. The movement of snake model is then governed by an additional energy term related to its similarity to the LiDAR building boundary. Our method is also able to extract buildings with varying color, from complex environments.  The remainder of this paper is organized as follows. Section 2 is devoted to the description of the proposed method. Then, experimental results are presented in Section 3. Finally, Section 4 provides conclusions and perspectives of this work.

METHOD
The novelty of our approach resides in an unsupervised method that carries out effectively building extraction from urban scene. This method is based on a snake model initialized and enhanced by integrating with LiDAR data, followed by an improved building polygonization. Presented by Fig. 1, the proposed method is composed of three parts: (i) determination of initial points involving the registration of datasets, (ii) the snake model, and (iii) the improved building boundary polygonization.
2.1 Determination of initial points 2.1.1 Registration This research study involves an airborne LiDAR data set which has been acquired independently of the optical imagery, i.e. two different surveys, done at different time, with different platforms. Such a context yields spatial discrepancies between data sets that provide the snake model with wrong initial points. Therefore, a registration has been carried out beforehand (Nguyen et al., 2019). Indeed, such a registration is substantially important to a building extraction procedure, since it involves many problems exemplified by the work of (Gilani et al., 2016).
The registration consists in extracting building regions, on one hand, from the LiDAR point cloud, and on the other hand, from the optical image using a mean-shift segmentation. Extracted building segments are then matched using the Graph Transformation Matching algorithm proposed by (Aguilar et al., 2009). The resulting pairwise segments establish a set of correspondences between the two data sets. They are used to estimate a transformation model using the Gold Standard algorithm (Hartley, Zisserman, 2003, p. 187). This transformation model is used to register the image and LiDAR data sets with an accuracy good enough to get relevant snake initial points.  Te = mean(zG) + max{2.5, std(zG)}, where zG denotes the elevation of ground points. All non-ground points are then vertically projected onto the plan z = 0. A raster grid representing these projected points is created. The resolution of the grid is set according to the LiDAR point cloud density in order to avoid null-valued pixels. A binary grid of same resolution is also generated. Its cell value is set to 1 or 0 according to the presence or absence of projected non-ground points in the cell ('1': presence, '0': absence). A morphological opening operator is then applied to remove small artifacts on the binary grid. Remaining grid cells with value set to 1 are grouped into labeled segments based on their connectivity. Next, small segments (e.g. smaller than 10 m 2 ) are removed. The resulting grid consists of a number of relatively large labeled segments that relate to buildings. These segments are then used to select the building points in the LiDAR point cloud. A convex hull is calculated on each set of 3D building points, yielding a set of boundary points, denoted by Bi. These points are then projected onto the optical image using the transformation model T. They will be used as the initial points of the snake model for each building.
2.2 Snake models for building boundary extraction 2.2.1 Traditional snake models An active contour, or a snake is a dynamic curve x(s) = (x(s), y(s)), where s ∈ [0, 1] is the normalized arc length, defined within an image domain that is deformable under the influence of internal and external forces (Xu, Prince, 1997). Mathematically, the behaviors of the snake are governed by an energy function, which is defined as follows, where Eint and Eext, respectively, represent the internal and external energy terms. The internal energy relates to the tension (the amount of stretch) and the rigidity (the amount of curvature) of the snake, respectively controlled by weighting parameters α and β. x (s) and x (s) denote the first and second derivatives of x(s) with respect to s. The external energy Eext is composed of the forces due to the image itself Eimg, and other constraint forces introduced by the users Econ, e.g. inflation force introduced by balloon model (Cohen, 1991).
The external energy of an image I related to its salient features i.e. lines, edges and terminations (i.e. line end-points, corners) can be generally formulated as follows, where w line , w edge , wterm are the weights of the respective salient features. Mathematical formulation of these energy terms is described in (Kass et al., 1988).
A snake that minimizes E snake must satisfy the Euler equation In order to solve (3), x is regarded as a function of time t as well as of s. Then, the partial derivative of x with respect to t is then set equal to the left hand side of (3), as follows, When x(s, t) stabilizes, the partial derivative term xt(s, t) vanishes and we obtain a solution for (3). This dynamic equation can also be viewed as a gradient descent algorithm designed to solve (1). A numerical solution to (4) can be found by discretizing the equation and solving the discrete system iteratively (Kass et al., 1988). (GVF) is proposed by (Xu, Prince, 1997) to allow more flexible initialization of snake and encourage convergence to boundary concavities, as well as improving the model's robustness versus image noise. GVF field is defined as the vector field v(x, y) = (u(x, y), v(x, y)) that minimizes the energy functional

Gradient vector flow
with µ is a controllable smoothing term, and f represents external forces from Eq. 3, i.e. f (x, y) = −Eext. Using the calculus of variations (Courant, Hilbert, 2008), the GVF can be found by solving: where ∇ 2 is the Laplacian operator. The Euler equations (6) can be solved by regarding u and v as functions of time and solving Once computed, v(x, y) will replace the potential force −∇Eext in the dynamic equation (3), yielding This equation is solved similarly as the traditional snake model, i.e. by discretization and iterative solution. The parametric curve solving the above dynamic equation is thus called a GVF snake.

Proposed snake model
In this paper, we propose adding a new energy term as a constrained force (i.e. Econ), which is calculated based on the similarity between the shape formed by snake points and the projected LiDAR building boundary, as follows, where dH denotes the Hausdorff distance, δ is the scale factor set accordingly to the size of the building, and x is the current snake. Bi represents the 3D building boundary extracted from LiDAR data, and TBi is its projection onto the image using the transformation model T provided by the registration. The role of this energy term is to encourage the snake to maintain the same shape as the building boundary extracted from the LiDAR data while being attracted by the image salient features and under the influence of GVF fields.
In our approach, the projected LiDAR building boundaries, denoted by TBi, have an important role. They provide the snake algorithm with relevant initial points, and they are used in the energy term E ShapeSim and the polygonization. Fig. 3 shows three examples of 3D building boundary points projected onto the optical image.
Up to this point, the proposed snake model has involved a number of parameters, such as α, β (from Eint), w line , w edge , wterm and σ (from Eimg), µ (from GVF) and lastly δ (from E ShapeSim ).
To the best of our knowledge, the know-how to set these parameters for the snake to extract buildings from an imagery dataset still remains unresolved. This has driven many existing works to perform a trial-and-error approach to determine them (Peng et al., 2005, Kabolizade et al., 2010. Similar approach is applied for our snake model, empirically setting α = β = 0.01, w line = 0.04, w edge = 2, wterm = 0.01, with standard deviation σ = 10 used for image smoothing in Eimg, µ = 0.2 and lastly δ = 50.

Polygonization of building boundary
The polygonization step is then applied to each set of building boundary points extracted with the snake model, i.e. bi. It involves formulating these boundary points into regular building polygons. Such polygon usually consists of several parallel and perpendicular straight line segments. Many researches have addressed this topic for various purposes, such as building boundary (Dutter, 2007), cartography (Samsonov, Yakimova, 2017), and classification maps (Maggiori et al., 2017).
In this paper, we modified the method proposed by (Dutter, 2007) which polygonizes a building into three levels of shape: rectangular; Z-, T-or L-shape; U-shape.  Shapira, 1975) containing all snake points, and the edges of building polygon are set to be parallel to the MBR edges. This setting may bias the polygonization result and eventually lead to errors in building orientation. In our proposed approach, instead of detecting the MBR containing snake points, we detect and use the MBR of the projected LiDAR building boundary points which yields more reliable building orientation.
A comparison of building boundary polygonization methods has been carried out. It involves our approach, Dutter's original polygonization (Dutter, 2007) and Douglas-Peucker line simplification algorithm (Douglas, Peucker, 1973). Results are provided by Fig. 4, which underline the significant reduction of building orientation error yielded by our approach. Since Douglas-Peucker algorithm only reduces the number of points of a boundary, it does not yield a building orientation.

RESULTS
This section is devoted to the assessments of the snake model and the overall building extraction performance.

Evaluation metrics
In order to assess performance of a building extraction method with respect to a ground truth, the following metrics are used: 1. Intersection over Union (IoU) (Jaccard, 1901): where A(·) represents the area measurement. IoU measures the ratio between the area of the intersection of an extracted building boundary E and the ground-truth building boundary R, over the area of their union.
Two related criteria, namely Completeness (Cp) and Correctness (Cr), are also used to facilitate the comparison of our approach with other works, e.g. (Fazan, Dal Poz, 2013, Gilani et al., 2016. Full descriptions of these metrics can be found in the paper of (Rutzinger et al., 2009).
• Completeness, measured by the Recall criterion: • Correctness, measured by the Precision criterion: (12) where T P , F P and F N respectively denote the number of true positive (i.e. building pixels correctly identified as building pixels), false positive (i.e. non-building pixels identified as building pixels) and false negative pixels (i.e. building pixels not identified).
All three metrics IoU, Cp and Cr reach their best value at 100% and worst at 0%.
While the IoU metric reflects the overall accuracy of a building extraction method according to a ground truth, Cp represents the fraction of relevant identified building pixels over total number of actual building pixels, and Cr represents the fraction of relevant identified building pixels among all identified pixels.

Euclidean distance between centroids (EDC):
where d(·) is the Euclidean distance, whereas CE and CR, respectively, stand for the centroid coordinates of the extracted building and the ground truth building.

Dominant angle rotation error (DARE):
where θE and θR represent the dominant angle of the extracted building and the ground truth building. The dominant angle of a polygonized building is determined according to its longest line segment.

Data sets
3.2.1 Quebec City The first data sets used for these assessments were acquired in Quebec City (QC, Canada) and are described in Table 1. Boundary of twenty buildings on this test area are manually determined and used as ground truth.

Vaihingen
The proposed method is also tested using the ISPRS benchmark data set on Vaihingen, Germany (Cramer, 2010). This additional test aims to demonstrate its effectiveness on complex environments, and to compare it with other existing methods.
We use the true orthophoto mosaic given as a RGB image of 9-cm resolution generated from aerial images that are taken between July 24 and August 06, 2008; whereas the airborne LiDAR dataset of an average point density of 4 points/m 2 is acquired on August 21, 2008. Since the misalignment between the orthophoto and the airborne LiDAR data was already small, a registration between the datasets was not carried out.
This benchmark data set provides three test areas consisting of buildings with diversified characteristics, as well as their ground truth boundaries. Area 1 is situated in the center of the city and characterized by dense construction consisting of historic buildings with complex shapes and some trees. Area 2 is composed of high-rise residential buildings surrounded by trees. Finally, Area 3 is purely residential with detached houses and many surrounding trees.

Evaluation of snake models
Fig. 5 presents the results of different snake models applied to extract buildings from the Quebec City data set: basic snake without GVF and Eimg, GVF snake, snake model of (Guo, Yasuoka, 2002), snake model of (Kabolizade et al., 2010), and our proposed method. In this comparison, all snake models are initialized by the LiDAR building boundaries projected on the optical image (i.e. TBi). As we focus on an unsupervised approach for snake models without a priori information of building gray-level, method of  is not considered. Visual comparison shows that our proposed snake model always converges better toward the building true boundaries than other snake models on all presented buildings. Table 2 summarizes the IoU metrics computed based on snake models with respect to ground truth building boundaries. All the snake models except the basic snake model show a high average IoU value, i.e. more than 88 %, which demonstrate the gain of using the projected LiDAR building boundaries as initial points for snake models.
However, we notice unstable IoU metric values from GVF snake and snake model of (Kabolizade et al., 2010), underlined by the IoU minimum values. Furthermore, IoU value of snake model of (Kabolizade et al., 2010) worsens if the building has similar color to its background (cf. Fig. 5c and 5e). On the other hand, snake model of (Guo, Yasuoka, 2002) yields good IoU but it is usually unable to converge toward building corners and boundary concavities (cf. Fig. 5c-5e).

Overall performance assessment
This sub-section is dedicated to the evaluation of the overall performance of our proposed method on both data sets. In addi- tion, using the Quebec City data set, we compare it with the approach using only LiDAR data, in order to demonstrate the benefit of using jointly imagery data with LiDAR data. Then, using the ISPRS Vaihingen data set, we compare the result provided by our method with six existing non-snake building extraction methods.
3.4.1 Performance results using Quebec City data set Building extraction result yielded by our proposed method on urban area of Quebec City can be visually assessed through Fig. 6. Snake initial points and snake result (before polygonization) are also depicted. Many buildings with varying color, or color similar to the background or gable roofs are well delineated. The metric values averaged on the area are summarized by  Compared with metrics of LiDAR-only method, the proposed method yields more accurate IoU and EDC metrics. However, it can be remarked that the value of Correctness Cr of the LiDARonly method is higher than the proposed method, i.e. 97.05% versus 92.88%. The reason is because in many cases, LiDAR boundary points can be found inside the true building boundaries, e.g. Fig. 3. These cases yield a Cr value equal to 100% (since E ∩ R = E when E ⊆ R), which increase the average value of Cr. Also, the DARE value of the proposed method is relatively small which confirms its reliability and accuracy in determining building orientation.

Performance results using ISPRS Vaihingen data set
The proposed method is applied on all three test areas, respect- ively presented by Fig. 7a, 7c and 7e. Extraction results are then evaluated according to the ground truth building boundaries, provided within the benchmark data set. Fig. 7b, 7d and 7f illustrate this evaluation denoting true positive, false positive and false negative pixels. Their significations were mentioned in sub-section 3.1.
First, the results are visually assessed. Although a high percentage of building pixels are well extracted (represented by yellow pixels), several issues can be identified in all three test areas. Firstly, several buildings are not entirely extracted but only partially, causing false negative pixels. This issue is due to complex building roofs and connected buildings. Also, several small buildings are also not extracted. Secondly, several vegetation regions that are close to buildings are also extracted as buildings, causing false positive pixels. Pixel-based assessment metrics on each area are provided by Table 4. These metrics are then compared with other methods that are selectively proposed by (Gilani et al., 2016), and synthesized in Table 5. The first two methods are categorized supervised, as they involve a supervised classification methodology based on training data, according to the taxonomy by (Rottensteiner et al., 2014). On the other hand, the others proposed unsupervised and data-driven approaches which consist in extracting buildings without predefined constraint on the building appearance (Gilani et al., 2016). method yields better IoU than most of the other methods, except the supervised method KNTU, or on Area 2 where ours exceeds all the others.
However the tests conducted on the ISPRS data set underlined a limitation in our approach. Indeed, there are several nullvalued regions in the orthophoto as shown in Fig. 9. These regions perturb the snake model convergence as they involve high gradient values in the image-based external energy term that attract the snake model.

CONCLUSIONS AND PERSPECTIVES
In this paper, we proposed and evaluated an unsupervised method to extract buildings from urban scenes, using snake model on Method Data types Processing strategy KNTU (Zarea, Mohammadzadeh, 2015) LiDAR + image supervised Whuz (Zhan et al., 2012) IIST (Gilani et al., 2016) unsupervised and data-driven Fed 2 (Gilani et al., 2016) Mon2 (Awrangjeb et al., 2014) LiDAR Yang (Yang et al., 2013)  aerial optical imagery with airborne LiDAR data. Without needing manual initial points nor training data, the proposed method is highly automatic and achieves a high accuracy of building extraction, indicated by different metrics on many test areas (cf .  Tables 3 and 4). Indeed, compared to other methods dedicated to building extraction, our work provide an alternative solution that is unsupervised and yields a better accuracy than most of them, except the supervised method proposed by (Zarea, Mohammadzadeh, 2015).
These results also demonstrate significantly the advantage of the conjoint use of optical imagery with LiDAR data, even with a LiDAR dataset collected a long time before, e.g. five years. Indeed, by the virtue of LiDAR-based initialization and guidance, our proposed method succeeds to extract buildings with gable roofs and/or having varying color, or even when the roof has very similar color to its background. It also overcomes the problem of disturbed shadow regions near buildings which is usually problematic to a building extraction method, e.g. (Fazan, Dal Poz, 2013). Compared with methods that use only LiDAR data, the proposed approach provides more accurate building extraction results. Moreover, the fact that this approach uses an optical image with a LiDAR dataset that is acquired several years in advance also means a cheaper and timelier solution, instead of requiring updated LiDAR data.
Ground truth Snake result (a) Building ground truth (cyan) and extracted boundary (green).
(b) Binary mask denoting null-valued pixels (white: null, black: non-null). Figure 9. Illustration of a wrongly extracted building due to null-valued image pixels.
However, like other snake model-based building extraction methods, our proposed method is also still dependent to snake parametrization in order to work well on complex areas. Future work will investigate on an automated contextualization of scene (either residential, industrial, or mixed) allowing to automatically parametrize the snake model. Nevertheless, more efforts will also be put on increasing effectiveness of the snake model on complex environments, such as the areas from the ISPRS benchmark data set. As a matter of fact, the results obtained on Vaihingen data set, even though being relevant compared to other methods, can still be improved in order to reach performances similar to those of Quebec City data set, i.e. average IoU values respectively of 83.23% versus 91.12%. There are two reasons that explain this difference. The process of building boundary extraction from the LiDAR point cloud of Vaihingen data set is less effective, due to the misclassification problem of vegetation and buildings. LiDAR points from Quebec City data set have a class value which facilitate the discrimination of building regions from trees. The second reason is the problem of null-valued regions in the orthophoto as mentioned above. Future works will also concentrate on these two issues.