A MIN-CUT BASED FILTER FOR AIRBORNE LIDAR DATA

LiDAR (Light Detection and Ranging) is a routinely employed technology as a 3-D data collection technique for topographic mapping. Conventional workflows for analyzing LiDAR data require the ground to be determined prior to extracting other features of interest. Filtering the terrain points is one of the fundamental processes to acquire higher-level information from unstructured LiDAR point data. There are many ground-filtering algorithms in literature, spanning several broad categories regarding their strategies. Most of the earlier algorithms examine only the local characteristics of the points or grids, such as the slope, and elevation discontinuities. Since considering only the local properties restricts the filtering performance due to the complexity of the terrain and the features, some recent methods utilize global properties of the terrain as well. This paper presents a new ground filtering method, Min-cut Based Filtering (MBF), which takes both local and global properties of the points into account. MBF considers ground filtering as a labeling task. First, an energy function is designed on a graph, where LiDAR points are considered as the nodes on the graph that are connected to each other as well as to two auxiliary nodes representing ground and off-ground labels. The graph is constructed such that the data costs are assigned to the edges connecting the points to the auxiliary nodes, and the smoothness costs to the edges between points. Data and smoothness terms of the energy function are formulated using point elevations and approximate ground information. The data term conducts the likelihood of the points being ground or off-ground while the smoothness term enforces spatial coherence between neighboring points. The energy function is optimized by finding the minimumcut on the graph via the alpha-expansion algorithm. The resulting graph-cut provides the labeling of the point cloud as ground and off-ground points. Evaluation of the proposed method on the ISPRS test dataset for ground filtering demonstrates that the results are comparable with most current existing methods. An overall average filtering accuracy for the 15 ISPRS test areas is 91.3%.


INTRODUCTION
Light Detection and Ranging (LiDAR) is extensively and routinely used today in topographic mapping as a direct 3-D data collection technique.Most common airborne LiDAR sensors emit directed laser pulses at rapid intervals to determine the ranges to the objects on the ground in order to calculate their coordinates with respect to a mapping coordinate system.These calculated locations provide 3-D sampled representations of the terrain and the objects on the terrain with varied point densities, depending on the operational parameters of LiDAR data acquisition.These point locations are acquired with much higher point densities than what is usually acquired by conventional topographic surveying.However, requirements of most applications go beyond raw point locations.Simpler representations are needed for practical purposes for many applications with respect to more efficient analyses, operability and data size.
Extracting meaningful information of the terrain along with other natural and man-made features on the ground using LiDAR data requires labeling of these unstructured sets of 3-D locations which are often called "point clouds".Point clouds are usually subjected to extensive analysis for extracting information on the features of interest to derive the final product (Biosca and Lerma, 2008;Vosselman et al., 2004).Automated processing of raw LiDAR point clouds is an ongoing challenge.New algorithms and methods are established and tested continuously to improve point cloud processing.Semi-automatic extraction of ground features from LiDAR datasets is among the important areas of research that is available for improvement (Cary, 2009;Vosselman, 2009).
Starting with a set of 3-D point locations, the objective of most LiDAR data analysis is to extract a simplified representation of the features in the scene.Among several feature types in a LiDAR dataset including the ground, vegetation and man-made structures, modeling the terrain has traditionally been a major objective and motivation of topographic laser scanning applications (Pfeifer and Mandlburger, 2008).Labeling points as "ground" and "non-ground" is commonly called "ground filtering".
Airborne LiDAR point clouds are analyzed by various approaches in the literature.Many of these approaches include ground filtering as the initial process.Once the ground points are filtered, remaining points may be further processed for extracting other features including vegetation, man-made structures like buildings, and other objects.
Many ground-filtering algorithms exist in literature since it is among the most studied problems in LiDAR data analysis.Apart from literature reviews carried out by individual researchers when introducing their proposed solutions for the ground filtering problem, Sithole and Vosselman (2004), Zhang and Whitman (2005), Liu (2008), Meng et. al. (2010) also provide comparisons and reviews of ground filtering approaches in literature.
Apart from these, methods which consider the issue from a point classification perspective using machine learning algorithms may be included as a separate category as well (Niemeyer et. al., 2013;Lodha et. al., 2007;Lu et. al., 2009).
All methods have their own advantages and disadvantages with respect to computational efficiency, ease of implementation, overall accuracy, or ability to perform well for different types of terrain.In general, many of the existing algorithms perform well on flat terrain whereas problems usually arise for terrain with variable topography, terraces, cliffs, sharp ridges, steep terrain mixed with vegetation or man-made structures etc.
Ground filtering methods using global optimization techniques have also emerged in order to avoid some of the problems above that may arise due to the local nature of many algorithms (Elmqvist, 2002;Mongus and Žalik, 2014;Zhou and Neumann, 2013).Such methods take global features of the terrain into consideration as well as the local ones.They try to avoid the undesired consequences of relying solely on the local features for the cases when they are actually manifestations of the global ones.
In this study, we deal with the ground filtering problem from a global optimization point of view.The relationships of the points in close proximity are evaluated together with the global features in order to avoid the pitfalls of limiting the evaluation aspects to local features only.The next section introduces the methodology used.We describe the filtering task as a labeling problem, whose optimum solution is found by searching a minimum cut of a graph formed by the point clouds.Section 3 describes the procedure of ground filtering.In particular, we discuss the formation of the graph and the calculation of the data costs and smoothness costs for the minimization.Section 4 presents test results and their evaluation on 15 ISPRS benchmark examples for ground filtering studies.Section 5 concludes the study by summarizing the properties of the proposed graph-cut filtering approach.

Filtering as Graph Optimization
In this study, we consider ground filtering as a labeling problem.The labeling problem deals with assigning a label from a set of labels to each of the points.Each point is labeled as ground or off-ground.Energy minimization is a powerful way to formalize the labeling problem.An optimum labeling is achieved by assigning each point a label from the label set to minimize an objective function (Delong et. al., 2010).The objective function maps a solution to a measure of quality by means of a goodness or a cost (Li, 2009).
A graph G = V(G), E(g), tG(.) consists of a pair of sets V(G) and E(g), and an incidence relation tG(.) which maps pairs of elements of V(G), to elements of E(g).V(G) contains elements that are called vertices or nodes, E(g) contains the elements which are called the edges of the graph G (Kropatsch et. al., 2007).
Graph-cuts are successfully employed as an optimization method in many vision problems based on global energy formulations (Boykov and Veksler, 2006).Graph-cut based methods are preferred to be used in pixel labeling problems in images due to the computational efficiency of solutions based on energy minimization framework for such problems (Szeliski et. al., 2008).

Minimum cut
The "minimum cut" on a directed graph with two special nodes, source (s), and sink (t), finds a cut as depicted in Figure 1 such that the graph is separated into two subsets with the minimum cost where s is in one subset and t is in the other subset.The cost of a cut is defined as the sum of all the weights of the graph edges with one node in one subset, and the other node in the other subset.The energy function for the labeling problem is defined as Costs for assigning ground or off-ground label to each point are called data costs of the energy function.They are calculated as the weights for the edges that connect the points to the auxiliary ground and off-ground nodes of the graph.The edges connecting points with each other have the weights corresponding to the smoothness term of the energy function.Data costs are calculated as a function of the feature vector f which is identified to formulate the likelihood of a point belonging to ground or off-ground classes.Smoothness costs are calculated as a function of the feature vectors of the pairs of points connected with an edge.The constant λ controls the relative contribution of the data costs in comparison with the smoothness costs to the total energy.

Preprocessing
Lidar point clouds contain significant amount of noise and outliers that affect the information acquired when left untreated.
In order to assure the reliability of the point cloud dataset before proceeding with any attempt to process, a preprocessing step is applied.For each point, we check the number of points that fall into a rectangular prism centered at the point and remove the ones that have less than a certain number of points contained by the prism.This step assures that only the lidar points with sufficient neighboring points are included in the process.

Data Costs
The data costs are defined as a function of the height ℎ  of each point with respect to an approximate coarse ground calculation.
In order to calculate the approximate ground, it is first assumed that there is at least one ground point within radius R proximity of each point.Initial ground approximation is then the lowest point within this radius around the point.Then, we gradually reduce R with a reduction rate and pick the lowest point within the reduced neighborhood of the point until the slope between the last approximate ground point and the point of inquiry exceeds a slope threshold.We calculate the ground data costs for each point as where  1 ,  2 are the slopes of the piecewise linear data cost function (ℎ  ) with a slow ascent up to the point height threshold and a steep ascent for higher points.This is used for allowing points that are closer to the ground be associated with the ground much stronger than the points that are above this threshold.Later we scale the calculated data costs between 0-100 by. Figure 2 below shows an example visualization of the ground data costs.
Figure 2. Example of data costs for ground points.

Smoothness Costs
The smoothness cost function that we use for ground filtering is an exponential function of the slope   of the graph edges with respect to the approximate coarse ground.
, (  ,   ) = exp (− Slopes of the graph edges serve as an indication of similarity of point pairs.A high slope value suggests that the points on two sides of an edge are not on the same surface while a low slope suggests otherwise.The parameter   regulates a slope range to consider the pair different.Smoothness cost penalizes discontinuity between two neighboring sites.A large penalty applies to the edge in case two nodes of the edge are close to each other with respect to the slope feature, but they are labeled differently.In the case of labeling one of the nodes as ground and the other node as off-ground will increase the overall energy while labeling them the same will have the opposite effect.Hence, the optimization algorithm will prefer to keep these nodes labeled the same due to lower energy.Figure 3 shows a visualization of smoothness costs by color coding the edges of the graph.The parameter   is determined in terms of slope difference.It limits the extent of the difference in slope an edge connecting two points to be considered on the same surface.The higher the parameter   , the more likely two points with more slope differences will be assigned a high penalty.

Min-cut Optimization for Ground Filtering
Once the graph is constructed, the data and smoothness costs are calculated for each point and each edge.The point cloud is then assigned an arbitrary labeling to initiate the min-cut optimization.Then, an alpha expansion graph-cut optimization is carried out based on the algorithms in Boykov et. al. (2001), Kolmogorov and Zabih (2004), and Boykov and Kolmogorov (2004) using the implementation provided by Veksler and Delong (http://vision.csd.uwo.ca/code/).Figure 5 below presents an overall workflow of the entire ground filtering process.
Figure 5. Workflow for min-cut ground filtering of airborne lidar data.

TEST RESULTS AND EVALUATION
The dataset used to evaluate our algorithm is the test samples provided as benchmark by the International Society of Photogrammetry and Remote Sensing (ISPRS) for determining the performance of ground filters.It consists of 15 samples with different landscape characteristics including steep slopes, discontinuities, bridges, complex scenes, outliers, vegetation on slopes, low number of bare earth points (Sithole and Vosselman, 2004).
Since the characteristics of the sample test sites were varied, we have applied our ground filtering algorithm with different parameters for each sample for optimum performance.Type I and Type II errors, together with the total errors calculated for each test sample are presented in Table 1.Even though the sample with the lowest total error is S61, this is misleading since this sample has the highest Type I error.
The algorithm performed very poorly to identify the off-ground points on this sample.Since the number of these off-ground points were too little compared to the number of ground points, Type I error is high while Type II error is the second lowest among all samples.The second and third best results with respect to the total error are S21 and S42 with S21 having a slightly large Type I errors.
The results with the highest total errors are S11 and S23.S11 has a landscape on a steep slope with buildings, road, and vegetation.S23 on the other hand, has a very complex structure on a multi layered landscape.Type I error is higher in S11 while Type II error is the higher one in S23. Figure 6 shows the filtered ground for the samples with the highest two and lowest two total errors while Figure 7 shows the filtered ground for several selected samples.When the results are evaluated, it is observed for all samples with a total error lower than 10% that Type I errors are much higher than Type II errors with the exception of S71.This indicates a trend of off-ground points being misclassified as ground for these samples.A quick look at the misclassified points reveals that they mostly correspond to low vegetation points as it can be also observed in the filtered grounds presented in Figures 6 and 7.
Misclassification of low vegetation is mainly due to the preference of the m1 and th parameters of the data cost function.
The accuracy of the point heights calculated from the approximate ground are dependent on the performance of the multiscale coarse ground approximation.
There is a trade-off for how close a ground point can initially be approximated.The farther the approximate ground point is to the point of interest, the higher the point's initial height may be.
On the other hand, closing in too much may result with offground points incorrectly being considered as the ground approximation.This may be the case especially with large but low building roofs.The parameters m1 and th are selected with this balance in mind.As a result, low vegetation points are assigned data costs in the lower range and the optimization favors them incorrectly as ground points.
In order to remedy this problem, we are planning on an improvement on our approach for approximating the initial ground for calculating the data costs and iterating it with the optimization as future work.
The contribution of the data cost in comparison with the smoothness cost becomes critical for the terrain with steep slopes.In order to compensate for the high ground data costs of the points on slopes, in contrast with their actual label being ground, the importance of the data cost is reduced via λ parameter of the smoothness function.

CONCLUSION
In this study, we have proposed and implemented a ground filtering algorithm based on the graph-cut optimization of an energy function which takes both the local and the global features into account.Overall, the proposed algorithm is able to handle variations in the landscape and outliers considerably well.However, it doesn't perform as good for low vegetation and high discontinuities.The test results show that an average accuracy of 91.3% can be achieved with an average Type I error of 17.4% and average Type II error of 7.2%.
As mentioned previously, a superior approach for approximating and iteratively improving the initial ground is a fundamental enhancement required for the better performance of our algorithm.
Sensitivity of the data cost function to the change in the terrain is another issue that needs to be addressed.Even though it is possible to adjust the parameters of the data cost function for terrain type in case of monotonous landscape, variations in the topography requires the data cost function to adapt to different parts of the dataset simultaneously.One of the areas that we will focus our efforts on is to develop the data energy to conform to the topographic properties of the terrain.
Current graph structure in our algorithm considers only the immediate Voronoi neihgbors of each point to calculate the smoothness costs.More complex local relationships may better represent the local smoothness of the ground instead of depending only on the immediate neighbors of the points.
Our current approach for determining the weight of the data cost with respect to the smoothness cost is carried out more by experience and intuition on the terrain types than an objective calculation.Future efforts will also focus on the investigation of the appropriate weight assignment for balancing data and smoothness costs for a particular area.

Figure 1 .
Figure1.A graph-cut (Adapted fromBoykov and Veksler, 2006) In this study, we adaptBoykov et.al.'s (2001)  minimum cut optimization algorithm by establishing a graph model formalizing the ground filtering problem.Each point in the point cloud is a node in the graph.Each node is connected to its 3-D Voronoi neighbors with the edges of the graph.All points are also connected to the auxiliary nodes which represent the labels, i.e., ground and off-ground.

Figure 3 .
Figure 3. Example of smoothness costs for ground points.
Figure 4 below demonstrates the change of smoothness costs with respect to the change in the parameter   .

Figure 4 .
Figure 4. Ground smoothness cost change with respect to   .

Figure 6 .
Figure 6.Original (top) and ground-filtered (bottom) data for the highest two (left) and lowest two (right) total errors.