3D building change detection between current vhr images and past LiDAR data

: Change detection is an essential step to locate the area where an old model should be updated. With high density and accuracy, LiDAR data is often used to create a 3D city model. However, updating LiDAR data at state or nation level often takes years. Very high resolution (VHR) images with high updating rate is therefore an option for change detection. This paper provides a novel and efﬁcient approach to derive pixel-based building change detection between past LiDAR and new VHR images. The proposed approach aims notably at reducing false alarms of changes near edges. For this purpose, LiDAR data is used to supervise the process of ﬁnding stereo pairs and derive the changes directly. This paper proposes to derive three possible heights (so three DSMs) by exploiting planar segments from LiDAR data. Near edges, the up to three possible heights are transformed into discrete disparities. A optimal disparity is selected from a reasonable and computational efﬁcient range centered on them. If the optimal disparity is selected, but still the stereo pair found is wrong, a change has been found. A Markov random ﬁeld (MRF) with built-in edge awareness from images is designed to ﬁnd optimal disparity. By segmenting the pixels into plane and edge segments, the global optimization problem is split into many local ones which makes the optimization very efﬁcient. Using an optimization and a consecutive occlusion consistency check, the changes are derived from stereo pairs having high color difference. The algorithm is tested to ﬁnd changes in an urban areas in the city of Amersfoort, the Netherlands. The two different test cases show that the algorithm is indeed efﬁcient. The optimized disparity images have sharp edges along those of images and false alarms of changes near or on edges and occlusions are largely reduced.


INTRODUCTION
Up-to-date 3D city models are needed for many applications, especially for disaster management (Zlatanova and Holweg, 2004).Creating an accurate up-to-date 3D model from scratch is time consuming and labor intensive.Change detection is a critical step to make minimum effort to maintain an accurate 3D city model.LiDAR data is often available for constructing 3D city models, however, it is expensive so the updating rate is low at state or national level.For example, open source nation wide point clouds are available for the whole of the Netherlands, but the updating rate is 5-10 years.Very High Resolution (VHR) images are often available every year due to their lower cost.Given their rich color and geometric information, they recently draw attention for detecting changes in existing 3D information.In past decades, spectral information has been mainly used for deriving changes.However, especially for building changes, the accuracy and effectiveness of geometric information is considered higher (Qin, 2014, Tian et al., 2014).In this paper, the main focus is to derive 3D geometric building changes by comparing new VHR images (taken at time T1) to existing LiDAR data (obtained at time T0).Of course, the time of the sources is reversible, but images are often newer than LiDAR data in real applications.
Existing literature presents three main approaches for deriving geometric changes between LiDAR data and images (Qin et al., 2016) projection based geometric change detection by using an existing DSM to project two image pairs and check color similarity.Shadows in a single image are an indication for the geometric status of a scene.By using the geometric information to predict shadows in the image, changes are detected in case the predicted shadows do not exist in the image or the other way around.This approach is rarely discussed in literature due to several reasons: the quality of changes derived depends on quality of shadow predication from LiDAR data and shadow detection from images.To predict shadows, a DSM or triangular model is required to be constructed from LiDAR data.However, an accurate model is difficult to obtain automatically (Zhou and Gorte, 2017).On the another hand, many shadows from images are occluded by trees.In addition, only height differences along the sun direction cast shadows, so only partial geometric change can be derived from shadow change.
Thanks to the development of dense image matching algorithms, such as structure from motion (Furukawa and Ponce, 2010) and global matching (Hirschmuller, 2008), detecting changes directly between two point clouds is very popular recently.Many researches subtract two DSMs interpolated from point clouds to detect changes.Two DSMs were created from stereo aerial images before and after earthquakes for detecting collapsed buildings (Turker and Cetinkaya, 2005).However, many false alarms were detected in small buildings due to DSM errors and co-registration errors.These problems also contribute to many false alarms in edge areas.In order to remove errors near edges, a windowbased approach by considering minimum height differences is applied to filter errors (Tian et al., 2010).This approach is ef-  fective to filter false alarms for large buildings, but also often filter small changes which are interesting for mapping or taxation purposes.Instead of calculating the vertical difference by subtracting two DSMs, 3D Euclidean distances are measured to reduce edge problems by searching for corresponding points between two DSMs or point clouds (Qin et al., 2016).A 3D surface matching algorithm (LS3D) (Gruen and Akca, 2005) was applied to register two DSMs for calculating 3D distance (Qin, 2014).However, occlusions and incompleteness of point clouds from LiDAR and images introduce more difficulties for co-registration when finding correspondences.Especially, the accuracy of point clouds is affected in regions of low texture, shadows and occlusions where finding correct stereo pairs through epipolar lines is problematic.This results in many false alarms directly.
Instead of extracting point clouds from stereo pairs, projection based geometric change uses existing geometry to project images for finding stereo pairs and derive change directly.Object height causes relief displacements in the images.If the correct height is known, displaced pixels can be projected to their correct place so that stereo pairs found should have similar color.Actually, this is the process of making true ortho-photos.This approach is better than change detection from two point clouds (or DSMs) for two reasons: 1) it can reduce false alarms in problematic areas, which avoids to search through epipolar lines to find stereo pairs to derive points for comparison.2) change detection is formed as checking the color similarity of stereo-pairs found by exist-ing heights and camera geometry, avoiding finding corresponding points for comparison.Images from street view were projected by a 3D model and color similarity was applied for finding changes for this models (Taneja et al., 2011).However, two problems still remain: first, the stereo pairs found by using heights from 3D models suffered from occlusions; second, the quality problem of the existing 3D model, especially near edges, is not explicitly addressed.To solve the first problem, epipolar images (Szeliski, 2010) with less occlusions, instead of ortho-images, were created from the existing DSM (Qin, 2014).Due to the co-registration errors and quality problems of DSMs, the epipolar images containing disparity values were adjusted through the semi-global matching (Hirschmuller, 2008)  Little work has been done for pixel based change detection using projection based approach that explicitly addresses false alarms near edges.The scientific contributions of this paper are as follows: 1) We propose pixel-based building change detection by finding stereo pairs from VHR images through epipolar geometry by supervision of existing LiDAR.Occlusions is ex-plicitly addressed by a consistency check between disparity images of image pairs (Hirschmuller, 2008).The quality problem of DSMs is solved by adjusting disparity images in a small range by a matching algorithm with a MRF probabilistic model.
2) The edge problem is explicitly addressed by fusing planar information in DSMs and sharp edges in the image.Planes from DSMs provide possible height values for edge pixels which can be transformed to form disparity search space for the MRF, while image edges define edge constraints in MRF.
3) Finding optimal disparities for each pixel from the MRF model is computationally efficient.The disparity images are segmented to many planes and edge regions for optimization and the search range is largely reduced by supervision of the existing DSM.
The paper is organized as follows: section 2 illustrates the methodology of change detection as illustration in Figure 1.In section 3 experimental results are presented followed by discussion.Finally, conclusions and an outlook to future work are provided in section 4.

METHODOLOGY
As shown in Figure 1, the method starts from extracting possible heights (three possible DSMs), especially for edges, by exploring the planar information in the LiDAR data.In the second step, each DSM is transformed to disparities by two transformations: 1) project the DSM to image pairs to find stereo pairs using camera exterior and interior parameters.2) stereo-rectify two images so that the epipolar lines occur in columns (or rows) so that disparities are easily calculated from stereo rectified pairs.These two steps are elaborated in Figure 2. Finally, MRF optimization is performed for planes and edges separately and changes are derived from pixels with high color difference.

2.1.Generation of possible heights
In a projection based approach, a DSM is required to find stereo pairs to check their similarity for change detection.So the accuracy of the DSM directly affects the quality of change detection.As the accuracy and density of LiDAR are high, in area of continuous surfaces, the DSM values from interpolation are also accurate.However, problems often appear near edges where height jumps happen.The heights of edges may be interpolated from points belonging to different surface so the quality of edges is poor.Especially edges near overhanging roofs, the interpolated value is largely affected by the points from inner walls and grounds and errors can even be in the order of meters.However, the heights in edge areas are continuous at a single nearby surface patch.As buildings are often formed by planar segments, accurate heights of edge area can be derived from plane information.
A 3D Hough transform combined with surface growing (Vosselman, 2012) is applied to extract planes from LiDAR as shown in the first row of Figure 2. Remaining points are removed as they are often noise.A Delaunay triangulation is applied on the plane points.If a pixel is located in a triangle with vertices from different segments, it is marked as an edge pixel.Otherwise, it is a plane pixel.The spatial resolution of the DSM is defined by the average spatial resolution of image so it is often 2-3 times higher than that of LiDAR.Combining with the reason that the many wall planes are segmented, these wall points contribute to plane pixels which should be treated as edges.A closing morphological filter is applied to the classified image and the new edge pixels finds their possible plane from their neighbors.Three planes are chosen as pixel is interpolated from three points from different planes at most as shown in Figure 3. From the possible planes, the possible heights from edges are calculated.Due to coregistration error between LiDAR data and images, the heights of plane pixels should be adjusted in a small range in order to be well aligned with the images.So the possible height is the height from interpolation.In order to keep the data structure of the output simple, three DSMs are created as shown in second column of Figure 2, within which the plane pixels have the same heights.
Figure 3.The red edge pixel from an overhanging roof may interpolated from pixels from three different red planes, so three possible heights can be calculated from three red planes.The blue plane pixel from an interior roof plane is only related to the blue plane, so only one possible height is derived.

2.2.Transformation from DSMs to disparity image
By adjusting the height of pixels around three possible heights from DSMs, the best stereo pair can be found.However, heights are continuous values and optimizing a continuous value for every pixel is computational intensive.Disparity images present differences in pixel locations of stereo pairs in the epipolar line.
The heights can be derived from stereo pairs by forward intersection.So a DSM can be transformed to a disparity image and vice versa.Instead of adjusting continuous heights, the discrete disparities can be more efficiently adjusted.In addition, a consistency check of two optimized disparity images can efficiently resolve occlusions.
At first, as shown in the third colunm of Figure 2, by projecting a DSM to two image pairs with camera geometries, the stereo pairs are linked to the same pixel in the DSM.However, multiple pixels in the DSM can be projected to one pixel the image, so the pixel with the largest height value is selected to link the pixel to the image.A depth map is calculated in advance and then if two pixels in the image pairs projected from a DSM pixel which has similar heights with two depth maps, the two pixels are the stereo pair.Given a pixel in one image, its matching view in the other image must lie along the corresponding epipolar line (Bradski and Kaehler, 2008).Instead of making interpolation through each epipolar line, a stereo rectification is applied to transform the two images along the baseline so that the epipolar line will align along column or row of the image.Only two times interpolation after transform is required.Therefore, by transforming stereo pairs into the rectified images, the disparity is calculated by subtracting the column or row values.This process is demonstrated in Figure 2 fourth and fifth column.As the two images should be transformed along the baseline, their relative orientation and translation should be derived.From SIFT feature points (Lowe, 1999), the fundamental matrix between two images is estimated, which is decomposed to orientation and translation.After stereo rectification, a transformation defines the map between image before and after.So stereo pairs can be easily found from rectified images and disparities can be easily derived.Due to multiple projections, not every pixel in disparity image has a value.The empty pixels are filled with dominant values of its neighbors.With three DSMs, three disparity images are derived for the left and right image.

MRF Optimization and change detection
In order to make adjustment of disparities more efficient, the disparity images are also decomposed into segments and edges.If the pixel has the three same disparities, it is a plane pixel, otherwise it is an edge pixel.Due to projections, the classified image is affected by noise.A closing morphological filter is applied to the image to remove small holes inside segments.An erosion filter is applied to remove noise at edge segments, while a larger dilation filter is applied to not only reconstruct the eroded areas but also make sure they cover the edges in the image.The possible disparities for changed pixels are re-assigned.From connected component labeling, the segments of planes and edges are found.A small range is chosen for every possible disparity to form a disparity search space where the adjusted disparity is chosen from.
For each plane pixel, the disparity space is set with a range of ±T pixels from the only possible disparity, while for each edge pixels, the disparity space is set with the same range but from three possible disparities.By limiting the disparity space, the computational efforts are strongly reduced (Sinha et al., 2014).
A MRF is defined by an underlying graph G = (V, E) where each vertex v ∈ V presents a pixel in the disparity image, and each edge e ∈ E connects the pixel to one of its neighbors Ni.
Let color of stereo pairs be represented by s = {si}i∈V .The corresponding disparity is given by d = {di}i∈V .The MRF assumes that di obey the Markov property that its possibility is also conditional on its neighbors (Kumar and Hebert, 2003).The goal is to find optimal disparities to maximize the posterior possibility formed as : where Z is a normalization constant and ϕ and ψ are association (AP) and interaction potentials (IP) respectively.The potential is often defined as the exponential function of the possibility: ϕ(si, di) = e P (d i |s i ) .The probability of disparity given the stereo pairs is defined by as a normalized cross-correlation (NCC) score in [−1, 1] between 5 × 5 image patches centred in the pairs: The interaction potential is also defined as: ψi,j(di, dj) = e P (d i ,d j ) .The interaction potential prefers the disparity to be smooth, but allow the disparity to be different near image edges.The probability of edge constraint interaction term is defined as: where gi, gj is the gradient of pixel i and its neighbor j respectively.Difference of gradient of neighboring pixels can be defined as ∆g = gi − gj and ω(•) is edge awareness function on ∆g.Therefore ω(∆g) is defined as: Where a, b, θ are parameters define the shifting and decay speed of the possibility curve.After maximization of the probability, the optimal disparity is selected for each pixel.However, many pixels in one image are occluded in the other image so the disparity found there is not correct.Two optimal disparity images are created for image pairs for consistency check.If two disparity images indicates the different stereo pairs, the pixel is occluded.The changes are detected by the pixels with low scores from MRF, which combines scores from AP and IP terms.As disparity values are not varying so much due to our settings, changes are only detected by checking the color similarity of stereo pairs.A simple threshold is set to get a binary change image.A closing morphological filter is applied to fill the holes in change areas.As many irrelevant changes can be caused from bushes or thin walls, so the small change segments with less than 30 pixels are irrelevant and removed.After filtering, the changes for buildings are derived.

EXPERIMENT RESULTS AND DISCUSSION
The study area is an urban area located in Amersfoort, the Netherlands.The reason for choosing this area is that many small building changes happened between 2008 and 2010.AHN2, the Li-DAR data, is an open source dataset for the Netherlands which was collected around 2009.The accuracy of LiDAR is 5 cm and a minimum of 10 points per square meter (ppm).In urban areas, the density can reach to 20 ppm.The approximate spatial resolution is around 20 cm.Very high resolution images with a size of 11310 × 17310 and 3.5 cm spatial resolution were acquired in 2010 for detecting changes in the LiDAR data.The first two steps of the algorithm, which extracts the possible disparities for image pairs from LiDAR data, is done for the whole areas of overlap in the two images.In the optimization and change detection step, two small cases with sizes of around 800 × 850 and around 0.7 million pixels are selected to test our method as shown in Figure 4.The first case is chosen as the building contains serious overhanging roofs and if the edge problem is not solved explicitly, the false alarms can reach to 1 meters.The reason of displaying the interpolated DSM instead of LiDAR is that the DSM is better to show this serious problem.The second case is chosen as many small objects with a length of 2 m are newly built.If a window based approach is selected to remove the false alarms in the first case, the real changes in the second case are also removed.The programs are written originally in C++ with Opencv, Gdal and Liblas libraries except two sections: 1) surface growing algorithm to find planes from LiDAR data is using from the PCM software provide by Twente University 2) the optimization algorithm forthe MRF model, loopy belief propagation (LBP), is used Table 1.Values for several important parameters from UGM (Schmidt, 2007) library in Matlab but a calling C++ file from mex files.The whole process is integrated in a batch program.
In the first case in the Figure 4, the overhanging roofs in the lower part results in many noisy DSMs near the edges and relief displacement causes serious occlusions in the upper part.By our method, as shown in the second column, two optimized dispar-ity images are derived.Most edges of disparity are well aligned to the image edges.However, in the red box of the upper disparity image, the disparity of the roof does not stop on the image edges.The reason is that many pixels on facade are occluded in another image so MRF model prefers the disparity to be continuous through the roofs.This occlusion problem is solved by the consistency check.Therefore, in the upper image of the third column, the pixels with the red color in the change image indicate pixels that are occluded.The result still suffers a bit from shadows as the colors from the stereo pairs don't give enough information and is fully based on the smoothness of disparity.Therefore, several pixels in the slanted roofs prefer disparities from the ground which are more smooth than those from the roofs.By a consistency check, if the right disparity is found in another image, the effects are limited.The magnitude of color difference by stereo pairs from disparity image is indicated in column three.In order to compare the performance of our method, another approach using a DSM from interpolation to make true ortho-images for color comparison is applied.The result is shown in the bottom image in column three.This approach has false alarms near edges area and occluded areas.Even many changes are inside roof planes which is caused by co-registration errors.All these three false alarms are largely reduced.By thresholding and post-processing irrelevant bushes and walls, the final changes are detected as shown in the last column.In this case, no change should be detected on the building, but some false changes are detected because of trees.In the second case in Figure 4, two disparity images are derived and the disparity jump matches the image edges very well.The change magnitude image shows that our approach is much better than the simple ortho-images approach.In the end, these small newly built building are detected.
As the size of change is relative small, these small changes can be mixed with false alarms without our method.However, only the boundary part of changes are detected for new buildings.The reason is that even the stereo pairs are wrong, the color can be similar due to homogeneous colors in surroundings.
The time for optimization for two images with 0.7 million pixels only requires 1 minute with no parallelization.It proves that the algorithm is very efficient.One set of parameters in the MRF model is chosen for two cases in order to show the applicability of the method.Several important parameter values are set as shown in Table 1.

CONCLUSION AND FUTURE WORKS
This paper provides a novel approach to employ existing LiDAR data to supervise a MRF model to find the stereo pairs and then derive changes directly.As a DSM interpolated from LiDAR often have low qualities near edges and also co-registration errors between sources, the height in the DSM should be adjusted.This paper proposes to adjust discrete disparities instead of continuous heights for computational efficiency.Three possible heights of each pixel are derived from planar information in the LiDAR.They are transformed to possible disparities by multiple projects.By applying small range on each possible disparities, a limited disparity search space is formed not only improve the optimal disparity found but also for computation efficient.In the optimization step, we originally proposes to add the edge constraint from images in a MRF model to improve the disparities near edges.By splitting the global optimization problem to many local planes and edges, the optimization with VHR images becomes feasible.Finally, the change are detected from stereo pairs with large color difference.The results of two cases show that with the edge constraint and supervised disparity search space, most of stereo pairs are correctly found and the edges of the disparity image are sharp.With optimized disparities, the false alarms are largely reduced and small changes, even to 1 m, become possible to find.
Still future work is foreseen.First, applying the method to whole image to derive complete changes.Building change can be affected by trees, therefore, the changes caused by trees needs to be solved.Second, the optimal disparities found are affected by shadows due to the colors from stereo pairs don't give valuable information.Third, the changes derived are only partial changes due to homogeneous surroundings.The complete changes can be propagated from the changes resulted in this paper.
: (1) single image change detection by comparing shadows from different times.(2) direct geometric change detection by comparing LiDAR point clouds to image point clouds.(3)

Figure 1 .
Figure 1.The flow chart of methodology.

Figure 2 .
Figure 2. The five main steps to transform LiDAR to possible disparities.(1) Plane segmentation in the LiDAR data.(2) Interpolation of three possible DSMs from planes in LiDAR data.(3) Finding stereo pairs from images by camera geometry and each DSM.(4) stereo rectification to transform stereo pairs to along columns or rows.(5) Subtracting the position in columns or rows of each stereo pair to obtain disparity.Every DSM results in one disparity image.
by setting a small search range.The areas with high matching costs indicate changes.However, the study created patches from existing model for change detection instead of performing complete pixel-based change detection.So edge problems which are introduced by inaccuracies of DSMs are ignored.

Figure 4 .
Figure 4. Optimization and change detection results on two building cases.The first column shows the data source, a VHR image and a DSM interpolated from LiDAR.The second colunm shows the optimized disparity images.The third column shows the magnitude of change from our method and true ortho-photo approach.Red color,below 0, indicates occlusions.The more yellowish and bluish, the higher the magnitude.The fourth column shows the change detection results.